AC-ABC Hybrid Revised2.docx

22
A HYBRID ALGORITHM USING ANT AND BEE COLONY OPTIMIZATION FOR FEATURE SELECTION AND CLASSIFICATION (AC-ABC HYBRID) Abstract - Ant Colony Optimization (ACO) and Bee Colony Optimization (BCO) are famous meta-heuristic search algorithms used in solving numerous combinatorial optimization problems. Feature Selection (FS) helps to speed up the process of classification by extracting the relevant and useful information from the dataset. FS is seen as an optimization problem because selecting the appropriate feature subset is very important. This paper proposes a novel algorithm (AC-ABC Hybrid), which combines the characteristics of Ant Colony and Artificial Bee Colony (ABC) algorithms to optimize feature selection. In the proposed algorithm, Ants use exploitation by the Bees to determine the best Ant and best feature subset; Bees adapt the feature subsets generated by the Ants as their food sources. Thirteen UCI (University of California, Irvine) benchmark datasets have been used for the evaluation of the proposed algorithm. Experimental results show the promising behavior of the proposed method in increasing the classification accuracies and optimal selection of features. Keywords: Feature Selection, Classification, Ant Colony Optimization, Bee Colony Optimization, Artificial Bee Colony, Meta-heuristic search. 1. Introduction Feature Selection is viewed as an important pre-processing step in data mining, especially pattern classification [14, 15, 16 and 17]. Datasets record all the information and while trying to perform classification of the dataset, processing all the available information is a tedious task. Performance of classification is affected due to redundant, irrelevant and noisy features in the feature space and it is time consuming to process all the information. Feature Selection is the process of extracting the most relevant and useful information, so that the classifier results has increased prediction accuracies [11, 15 and 16]. Many algorithms exist to implement the process of Feature Selection. Related to pattern classification, FS techniques take any one of the two categories: Filter approach or Wrapper Approach [12]. Filter method uses distance, information, dependency, and consistence measures to evaluate the feature subset selected whereas wrapper method involves the use of a classifier and its feedback to evaluate the feature subset [13]. Optimization techniques like Genetic Algorithm (GA), Ant Colony Optimization (ACO), Bee Colony Optimization (BCO) and Particle Swarm Optimization (PSO)

Transcript of AC-ABC Hybrid Revised2.docx

Page 1: AC-ABC Hybrid Revised2.docx

A HYBRID ALGORITHM USING ANT AND BEE COLONY OPTIMIZATION FOR FEATURE SELECTION

AND CLASSIFICATION (AC-ABC HYBRID)

Abstract - Ant Colony Optimization (ACO) and Bee Colony Optimization (BCO) are famous meta-heuristic search algorithms used in solving numerous combinatorial optimization problems. Feature Selection (FS) helps to speed up the process of classification by extracting the relevant and useful information from the dataset. FS is seen as an optimization problem because selecting the appropriate feature subset is very important. This paper proposes a novel algorithm (AC-ABC Hybrid), which combines the characteristics of Ant Colony and Artificial Bee Colony (ABC) algorithms to optimize feature selection. In the proposed algorithm, Ants use exploitation by the Bees to determine the best Ant and best feature subset; Bees adapt the feature subsets generated by the Ants as their food sources. Thirteen UCI (University of California, Irvine) benchmark datasets have been used for the evaluation of the proposed algorithm. Experimental results show the promising behavior of the proposed method in increasing the classification accuracies and optimal selection of features.Keywords: Feature Selection, Classification, Ant Colony Optimization, Bee Colony Optimization, Artificial Bee Colony, Meta-heuristic search.

1. Introduction

Feature Selection is viewed as an important pre-processing step in data mining, especially pattern classification [14, 15, 16 and 17]. Datasets record all the information and while trying to perform classification of the dataset, processing all the available information is a tedious task. Performance of classification is affected due to redundant, irrelevant and noisy features in the feature space and it is time consuming to process all the information. Feature Selection is the process of extracting the most relevant and useful information, so that the classifier results has increased prediction accuracies [11, 15 and 16].

Many algorithms exist to implement the process of Feature Selection. Related to pattern classification, FS techniques take any one of the two categories: Filter approach or Wrapper Approach [12]. Filter method uses distance, information, dependency, and consistence measures to evaluate the feature subset selected whereas wrapper method involves the use of a classifier and its feedback to evaluate the feature subset [13].

Optimization techniques like Genetic Algorithm (GA), Ant Colony Optimization (ACO), Bee Colony Optimization (BCO) and Particle Swarm Optimization (PSO) have been successfully used to implement and optimize the feature subset selection in numerous application domains [3, 4, 12, 24, 25, 26, 27, 28, 29, 30, 31, 32, 34, 35, 36, 39 and 43].

ACO is a Swarm Intelligent algorithm inspired by the foraging behavior of ants and is useful in solving discrete optimization problems [1]. BCO is based on the foraging behavior of honey bees and has been successfully employed in optimization problems like dynamic server allocation in Internet Hosting Centre, Hex Game Playing Program, Travelling Salesman Problem, and Telecommunication Network Routing, Vehicle Routing, Quadratic Assignment, Graph Coloring, Job Shop Scheduling, and Machine Learning [19, 20]. Artificial Bee Colony (ABC) algorithm is the most popular and widely used BCO algorithm proposed by Karaboga in 2005 [8 and 33]. Since its proposal, ABC has been extensively used in solving optimization problems in numerous application domains. ABC is widely adapted because it is simple in concept, easy to implement, and has only fewer control parameters [7]. Also, ABC algorithm has shown competitive (sometimes superior) performance compared to GA, ACO, and PSO [6].

Although, many traditional algorithms and optimization techniques are available to implement and optimize Feature Selection, none of the methods could give consistent performance over all the datasets from different application domain. So, innovations are made in the existing algorithms. In this aspect, a novel algorithm that combines ACO and BCO techniques has been proposed and implemented.

Page 2: AC-ABC Hybrid Revised2.docx

Both ACO and BCO are meta-heuristic search algorithms that depend on information sharing among their colony members to enhance their search processes using a combination of deterministic and probabilistic rules. They are efficient, adaptive, robust, and dynamic search algorithms producing near optimal solutions [1 and 8]. In the present study, we have synthesized the advantages of both ACO and ABC, and have successfully combined both of them to optimize feature subset selection and improve the classification performance. In ACO, all the ants are attracted towards the shortest path (optimal solution) by means of pheromone accumulation. In ABC, the bees have the capability of exploiting all the available solutions by using waggle dance as a communication medium.

This paper is organized in 7 sections. The concept of Feature Selection and Classificationare explained in section 2. In section 3, Ant Colony Optimization and works related to ACO based FS are explained. The Artificial Bee Colony algorithm and related works are discussed in section 4. Section 5 explains the proposed AC-ABC Hybrid algorithm. Computations and results are discussed in section 6. Section 7 concludes the paper.

2. Classification and Feature Selection

2.1. Classification

A classifier takes a set of features as input and these features have different effect on the performance of classifier. Some features are irrelevant and have no ability to increase the discriminative power of the classifier. Some features are relevant and highly correlated to a specific classification [15, 16].

For classification, sometimes obtaining extra irrelevant features is very unsafe and risky [14]. A reduced feature subset, containing only the relevant features helps in increasing the classification accuracy and reducing the time required for training [28].

2.2. Feature Selection (FS)

High Dimensionality of the feature space affects the performance of classification due to the presence of noisy, redundant and irrelevant data. These uninformative features may dominate the informative ones for classification. FS is the process of extracting the relevant and most informative data from the feature space, so that the feature set is more suitable for classification. The features are selected in such a way that the original representation with the actual set of features is not affected [11]. It has been proved in the literature that “classifications done with feature subsets given as an output of FS have higher prediction accuracy than classifications carried out without FS” [12]. Also, with FS, the representation of features is made simple and the learning speed of the classifier is increased [41].

The search for good features in the feature set involves finding those features that are highly correlated with the classes, but are uncorrelated with each other. Locating such optimal features is very complex and hence FS techniques involve heuristic or random search strategies to avoid this prohibitive complexity [41]. In literature, many methods based on evolutionary and meta-heuristic search algorithms like GA, ACO, BCO and PSO have been proposed for optimizing the problem of Feature Selection [3, 4, 12, 24, 25, 26, 27, 28, 29, 30, 31, 32, 34, 35, 36, 39 and 43].

Both ACO and BCO are capable of providing efficient and robust approach in exploring large search space; they have individually been employed to optimize the selection of features and have proved to be efficient performers. This formed the basis for the proposal: when the searching and exploitation abilities of the Ants and the Bees are combined, an enhanced search technique with added meta-heuristics would be available to provide improved solution. To the best of our knowledge, the combination of both ACO and BCO techniques have not been attempted on feature selection. In the present study, the AC-ABC Hybrid algorithm combines the exploitation behavior of both ACO and ABC algorithms and has shown promising behavior in optimizing the feature subset selection.

3. Ant Colony Optimization

M.Dorigo and his colleagues introduced Ant Colony Optimization (ACO) in the early 1990s [1]. ACO exploits the self-organizing principles and the highly coordinated behavior of real ants and employs these techniques in finding

Page 3: AC-ABC Hybrid Revised2.docx

solutions to hard combinatorial optimization problems. Ants are social insects living in colonies with interesting foraging behavior. Ants can find the shortest path between the nest and the food source by means of Path Construction and Pheromone Update [2].

Path Construction: Initially, an ant walks in a random direction in search of food sources. While walking, it deposits a chemical substance called pheromone on the ground, which evaporates with time. So, shorter paths will have more pheromone compared to longer paths.

Pheromone Update: Ants can smell pheromone and they get attracted towards the shorter paths having more pheromone. As more ants keep getting attracted towards the shorter paths, pheromone gets accumulated in such paths. By this way, shorter paths (optimal solutions) are selected by the ants neglecting longer paths (non-optimal solutions).

These two phenomena keep driving the ACO algorithm towards promising regions of search space containing high-quality solutions [1, 2, and 3].

In literature, ACO has been used predominantly to optimize the selection of features [4, 24, 25, 26, 27 and 34]. The ants search for the most important features through the search space based on pheromone accumulation and evaporation over the iterations. In most of the works, the size of the ants is set to the number of features and a classifier is used to evaluate the features selected by the ants (Wrapper Approach). The heuristics related to each feature is obtained from the classifier evaluation, through the iterations [4, 24, 26, 27, and 34 ]. Filter method is used to evaluate the selected features [25]. If the user has some knowledge about the number of features required for classification, then the user can specify the lower and upper limit of the size of the feature subset to be selected by the ants. Hence the search space is defined with the user specified meta-heuristic [26]. In all these works, except [26 and 27], the number of features to be selected is not determined and hence the features with more pheromone values get added to the subset. The feature subset irrespective of size, which maximizes the prediction accuracy of the classifier, is obtained as the optimal set of features output from ACO. A bounded scheme is used to specify the size of the feature subset and this guides the ants in selecting the feature subset of reduced size. However, the size is specified in a way such that the feature subsets leading to ineffective solutions are not selected [27]. A Hybrid Method (combination of Filter and Wrapper methods) is used to frame rules to strengthen the global search ability of the ants, so that the features leading to high-quality solutions are selected [27].

The method proposed in the present study is wrapper based and there is no limit constraint specified on the size of the feature subset to be selected. The size of the ants is set to the number of features in the feature set. To start with the search process, each ant is assigned a feature subset consisting of random combination of features. The selection of a feature by the ant depends on the pheromone value of the feature and the heuristic information (proportion of ants that have selected the particular feature at that instant).The feature subset obtained from the ants serve as the initial food sources to the employed bees of the bee colony. The best feature subset and the best ant are determined by the feedback obtained from the onlookers. Hence, in the proposed method, ACO serves as the generator of initial food sources for the artificial bee colony.

4. Artificial Bee Colony Algorithm (ABC)

ABC, a swarm intelligent algorithm was proposed by Karaboga [8] and since then, it has been used widely in many fields for solving optimization problems [5, 6, 7, 9, and 10]. The ABC algorithm employs three types of bees in the colony: Employed Bees (EBees), Onlooker Bees (OBees) and scout bees. Initially, the food source positions are generated (N). The population of EBees is equal to the number of food sources. Each employed bee is assigned a food source. EBees exploit the food sources and pass the information of nectar amount to the OBees. The number of OBees is equal to the number of EBees. Based on the information from the EBees, the OBees exploit the food sources and its neighborhood until the food sources become exhausted. The employed bee of exhausted food sources becomes a scout. Scouts then start searching for new food source positions. The nectar information represents the quality of the solution available from the food source. Increased amount of nectar increases the probability of selection of a particular food source by the OBees [5].

Page 4: AC-ABC Hybrid Revised2.docx

ABC has been used in solving multi-objective optimization problems and proved to provide optimal solutions in air path planning, design and manufacturing, job shop scheduling, numerical optimization, pattern classification etc [8, 20, 37, and 38]. To the best of our knowledge, ABC has been successfully used to optimize the selection of features in [35 and 36]. In both of these works, ABC is used in hybrid with Rough Set Theory (RST). First, RST analyzes the features in relation to every individual class of the dataset. The salient features for identifying the individual classes are listed. The features that exist in the salient features of all the classes are used to form the core reduct. Then, ABC is applied to the feature set in which the core reduct has been discarded. The optimal feature subset that results is the combination of core reduct and the features selected by ABC [35 and 36].

Honey Bees Mating Optimization (HBMO) algorithm is another BCO algorithm, which is based on the marriage behavior in honey bees. HBMO algorithm is applied to a financial classification problem related to credit risk assessment. Nearest Neighbor classifier is used for classifying the credit risk groups and to evaluate the features selected by the queen bees. HBMO has yielded high performance of classification with reduction in the size of the feature set [39].

In the proposed method, there is no searching for food sources (feature subsets) by the employed bees. The employed bees are presented with the feature subsets produced by the ants. The population of employed and onlooker bees are equal to the number of features. The employed bees exploit the feature subsets and pass the information to the onlookers. The onlookers select the feature subsets based on the heuristic information from the employers. The onlookers pass the new selection of features to the ants for determining the best ant and the best solution.

5. The Proposed Algorithm

The AC-ABC algorithm proposed in this paper synthesizes and synergizes the advantages of both ACO [1 and 4] and ABC [8]. ACO explores the possible solutions available in the search space. ABC exploits the solutions twice, once by the employed bee and then by the OBees resulting in optimal solutions. The steps of feature selection by ACO algorithm are shown in Fig. 1 and by ABC algorithm is given in Fig. 2. These two algorithms form the motivation of the proposed AC-ABC hybrid algorithm.

1. Initialize the ACO parameters2. Create a set of binary bit stings (length equal to size of the feature set) and assign to the ants3. Each ant selects a feature based on the pheromone value and the heuristic information and generates

feature subsets4. Each ant updates the pheromone values of the features based on selection5. Each ant passes the feature subsets to the classifier for evaluation6. Update the pheromone values using best ant and the best feature subset7. Memorize the best feature subset generated8. Repeat steps 3-5 for a predetermined number of iterations

Fig. 1 ACO for Feature Selection

1. Initialize the Bee Colony parameters2. Each employed bee generates a feature subset (binary bit string) and exploits it3. Each Onlooker bee selects the feature subsets, evaluates their fitness by passing them to the

classifier, generates new feature subsets and exploit them4. Determine the feature subsets to be neglected and assign its employed bee as a scout for generating

new feature subset5. Memorize the best feature subset generated6. Repeat steps 2-5 for a pre determined number of iterations

Fig. 2 ABC for Feature Selection

In the proposed algorithm, the ant system works on the entire features of the dataset and presents the possible combinations of feature subsets that are optimal. These feature subsets represent the initial food sources for the bee colony. The EBees exploit the feature subsets for predictive accuracies available from each of them. The objective function and the fitness values for the feature subsets are then calculated. Based on the fitness value, the OBees exploit and decide the feature subset that yields the best predictive results. On the basis of resultant feature subset from the onlooker bee, decision of the best ant and the global pheromone update is done. The feature subsets

Page 5: AC-ABC Hybrid Revised2.docx

generated by the ants in the next iteration are passed as the newly generated food sources to the bee colony for exploitation. The algorithm halts when the predetermined number of iterations is reached. Feature subset yielding the maximum possible accuracy is obtained as a result of the proposed algorithm. The main steps of the algorithm are listed in Fig. 3.

1. Runs = 12. Initialize ACO Set the number of ants (K) equal to the number of features (M) For i from 1 to M, assign the pheromone values to the features (τ i). 3. Initialize ABC

Set the population of EBees and OBees(SN)equal to the number of ants (K)4. Ant:

RepeatFor i from 1 to M, for each featurei,

Calculate the probability of selecting the feature i by the ant ( pi)

Update the pheromone value τ i of the selected feature End For

Until all the ants have finishedRepeat

5. Employed Bee: Assign the feature subsets selected by each ant (Ant phase) to each of the employed bee as food source positions (S j)

Calculate the objective function (f j) and the fitness value (fit j)

Produce solutions (feature subsets) for onlookers (v j)

Calculate the probability ( p j) to determine the number of onlookers to be assigned to exploit the feature subsets6. Onlooker Bee:

Use the greedy selection mechanismDetermine the feature subsets to be neglected

Record the best feature subset by passing the selected subsets to a classifierPass the optimum feature subset to the ant colony7. Ant:

Update the pheromone value globally by using the optimal feature subset received from the onlooker8. Runs = Runs + 1

9. Ants: Generate new feature subsets from the optimal subsets obtained in the previous run 10. Scout bees:

EBees of neglected feature subsets becomes scout Assign the newly generated feature subsets to the scout

11. Record the best feature subset obtained so far. Until pre-determined number of runs is attained

Fig. 3 Main Steps of the proposed algorithm (AC-ABC)

AC-ABC Algorithm

The detailed description of the proposed algorithm and the equations involved are explained as follows:Initially, the population of ants (K), EBees and OBees (SN) are set equal to the number of features (M). Pheromone values are assigned to all the features. For the ants to begin the search, each ant is assigned a feature subset consisting of random combination of features. Each ant selects a feature based on the probability as specified in Eq. (1):

pi=τ i⋅Δτ i (1)

Where τ i is the pheromonevalue of the featurei andΔτ i is the proportion of ants that have selected this feature. Whenever an ant selects a feature, the pheromone value of the feature is updated using Eq. (2)

τ i=(1−ϕ ) . τ i+ϕ . τ0 (2)

Page 6: AC-ABC Hybrid Revised2.docx

ϕ is the parameter of relative importance and takes values from 0 to 1. After all ants have finished a run, the feature subsets selected by the ants are passed to the employed bee as initial food source positions. The feature subsets received from the ants represents the food sources in the bee colony and is represented as shown in Eq. (3):

F (S j ) , S j∈RN

(3)

where the food sourceS j represents the jth feature subset,F (S j ): j=1,2 , .. SN represents the set of all feature subsets selected by all the ants and passed to the bee colony, N is the dimensionality of the bee population and R

N

represents the entire feature space of the problem. The possible predictive accuracy that can be achieved by the feature subset represents the nectar quantity of it. Then using the nectar information, the employed bees compute the fitness values for each of the food sources using the Eq. (4):

fit j=1

1+f j (4)

wheref j is the objective value of the feature subsetS j .f j is based on the indiscernibility relation of the feature subset to the classes i.e. the ability of the feature subset to discern between the classes. After the fitness values of the feature subsets are known, the onlookers gain the information from the employed bees and select a feature subset for exploitation. An onlooker bee pointing to a particular feature subset selects a feature subset in the neighborhood based on the probability, which is calculated using the Eq. (5):

p j=fit j

∑n=1

SN

fitn (5)

In Eq. (5), fit j is the fitness of the neighborhood feature subsetS j the onlooker has selected. When a feature subset

is selected, the onlooker produces a new solutionv j by using the Eq. (6):

v j( S j )=x j(S j)+ϕ( x j(S j)−xl (S l) ) (6)

x j (S j ) is the predictive accuracy of the feature subset S j the onlooker bee is currently exploiting, x l( Sl ) is the

predictive accuracy of the feature subset Sl , the onlooker bee has selected for further exploitation in the

neighborhood and v j( S j )is the new solution produced by the onlooker at its current position, when trying to

produce new feature subset combinations. If the new solution v j is greater than the old solution x j at the subset S j

, then the new solution replaces the old solution i.e. the feature subset S j is replaced with a new feature subset,

which is the combination of features present in bothS j and Sl ; otherwise, the old solution x j is retained as such. ϕ is a randomly chosen number between [0, 1] and it is included to control the production of neighborhood feature

subsets around S j by the onlooker bee. Also, the parameter ϕ represents the comparison of the two feature subsetsS j and Sl visually by a bee. In Eq. (6), as the difference between x j (S j )andx l( Sl )decreases, the perturbation on

Page 7: AC-ABC Hybrid Revised2.docx

x j (S j )also gets decreased. This makes the step length to be adaptively reduced as the search approaches the optimal feature subset.

The production of new feature subsets by the onlooker bees as represented in Eq. (6) continues for a predetermined number of times. The best feature subset is decided upon and the newly generated feature subsets are passed back to the ant colony. Based on the best feature subset returned from the bee colony, global pheromone update is done using the Eq. (7):

τ i=(1−ρ) . τ i+(ρ . 1Kbest )

β

(7)

ρ is the rate with which the pheromone assigned to each feature evaporates. Kbest represents the ant, whose feature subset has given highest accuracy in the iteration, it is termed as the best ant. Then, with the new pheromone values the ants carry on their exploitation and based on the probability calculated using Eq. (1), the feature subsets are generated. The newly generated feature subsets serve as the new food sources for the bee colony in the next generation. These steps are executed for a number of iterations until the best possible feature subset yielding the highest accuracy is produced as a result of hybridization.

Highlights of the proposed algorithm AC-ABC:

Ant colony utilizes the exploitation procedure of employed and onlooker bees to determine the best ant and the best feature subset of the generation

Employed bee instead of generating feature subsets, adapts the feature subsets produced by the ant

Scout bees do not generate food sources, uses the newly generated feature subsets by the ants The feature subsets generated are evaluated thrice yielding optimal feature subset and maximum

prediction accuracy gain The meta heuristic search space converges easily and effectively compared to the individual ant

and the bee colony algorithms

6. Experiments and Discussion

The datasets used, the implementation, and the results of AC-ABC are discussed in this section.

6.1 Datasets

The performance of the proposed AC-ABC Hybrid approach discussed in this study has been implemented and tested using 13 different datasets. Heart-Cleveland, Dermatology, Hepatitis, Lung Cancer, Lymphography, Pima Indian Diabetes, Iris, Wisconsin Breast Cancer, Diabetes, Heart-Stalog, Thyroid, Sonar, and Gene are the datasets used. These datasets are taken from UCI machine learning repository [21] and their description is given in Table 1. The reasons for selecting these datasets are that they have been used predominantly in classifier ensemble and feature selection proposals for experimental proof. The datasets are chosen such that the number of features is in a varied range, so that the effect of feature selection by AC-ABC Hybrid is easily visible.

Table 1.Datasets Description

Dataset Instances Features ClassesHeart-Cleveland 303 14 2Dermatology 366 34 6Hepatitis 155 19 2Lung Cancer 32 56 2Lymphography 148 18 4Pima Indian Diabetes 768 8 2Iris 150 4 3Wisconsin Breast Cancer 699 9 2Diabetes 768 9 2Heart-Stalog 270 13 2Thyroid 7200 21 3Sonar 208 60 2Gene 3175 120 3

Page 8: AC-ABC Hybrid Revised2.docx

6.2 Implementation of the AC-ABC Algorithm

Classifications of the datasets are implemented using WEKA 3.6.3 Software from Waikato University [18] and feature selection using AC-ABC Hybrid has been implemented using Net Beans IDE. The classifier employed is Decision Tree and it is implemented by using J48 algorithm.

The values of the ACO and ABC parameters for all the datasets are initialized as shown in Table 2. After experimenting through a number of trials, these values are selected because these values give the best performance of the proposed algorithm. When the algorithm is implemented, ants select features and feature subsets are generated using Eq. (1) and Eq. (2). The feature subsets generated by the ants represent the food source positions (possible solutions) for the bee colony as in Eq. (3). EBees evaluate the fitness of these solutions using Eq. (4) and then, pass the information about the solutions to the OBees. OBees exploit the available solutions and generate new feature subsets using Eq. (5) and Eq. (6). This is done by passing the generated feature subsets to the classifier and based on the predictive accuracy returned from the classifier, the best feature subset of the iteration is decided. These information are passed from the bees to the ant colony and on this basis, global pheromone update is performed using Eq. (7). So, in the proposed system: the ants generate and pass the possible solutions to the bees. The bees evaluate the solutions by making use of the classifier and then send the information about the best solution to the ant colony.

Table 2. Parameter Settings of ACO and ABC

We have used Prediction Accuracy (PA) [15 and 16] as a measure for evaluating the performance of the proposed model. Prediction Accuracy is defined as the percentage of instances correctly identified and labeled by the classifier [15].10-fold cross validation [3, 4 and 15] is used to evaluate the prediction performance of the classifier. When the Hybrid AC-ABC is applied to the datasets, optimal feature subsets are selected resulting in increased prediction accuracies.

In order to show that the AC-ABC Hybrid algorithm is statistically significant, we have used F-Test (MANOVA) to calculate the power level of classification.

ACO and ABC Experimental ValuesMaximum No. of Iterations 1000

No. of runs 10Population Size (No. of Ants, EBees, OBees) No. of Features of the Dataset

ACO ρ 0.7β 0.8φ 0.3τ 0 0.2τ i Multiple values varying with dataset

ABCφ 0.4

Dimensionality of the population N

Page 9: AC-ABC Hybrid Revised2.docx

7. Results Analysis

With the parameter settings mentioned in Table 2, the hybrid algorithm proposed in the present study has been implemented on a machine with Intel Dual Core CPU, 1 GB RAM, and Windows XP operating system. For all the thirteen datasets listed in Table 1, the hybrid algorithm is independently executed 10 times; the best and the average results achieved are presented in Table 3. It can be inferred from Table 3 that, the size of the feature set has been reduced significantly with good prediction accuracies in both cases of the best and the average results.

Table 3.Results of AC-ABC Hybrid for 13 Different UCI Datasets

Note: PA (Best) represents the highest prediction accuracy obtained for each dataset through 10 independent executions. SF (Best) represents the number of features selected corresponding to PA (Best).

PA (Average) represents the mean of the accuracies of 10 independent executions for each dataset. SF (Average) represents the average of the number of features selected corresponding to PA (Average).

Since, the proposed hybrid algorithm aims to optimize feature selection related to classification, classification accuracy is considered as an important evaluation measure to assess the performance of the proposed method. So on the basis of Prediction Accuracy (PA), the AC-ABC hybrid algorithm is compared with other meta-heuristic search based FS methods that exist in the literature. In Table 3, though, we have presented the best and the average results obtained from the hybrid algorithm, the best results (Prediction Accuracy and Selected Features (SF)) have been considered for evaluating the performance of the AC-ABC Hybrid algorithm in comparison with other methods in the literature.

7.1. Performance Comparison with Meta-Heuristic Search based Feature Selection Methods

In literature, feature selection related to classification has been experimented by a vast number of meta-heuristic based search algorithms. In order to compare the performance of the proposed method, we have considered some of the methods available in the literature. These meta-heuristic based FS methods have experimented on datasets and some of them are common with the datasets that have been used for the present study. The methods that are considered for comparison are: ACO-NN Hybrid [26], Hybrid ACO [27], ACO Based FS [28], PSO-SVM [29], CatFish Binary PSO [31] and IQR Bee [36]. In Table 4, the performance of the proposed AC-ABC Hybrid is compared with that of these methods.

7.1.1. Comparative Analysis with Hybrid ACO [27] In Hybrid ACO, a new hybrid algorithm for feature selection based on meta-heuristics has been implemented

using Neural Networks. The advantages of filter and wrapper methods are combined and to facilitate the search process, new set of rules has been designed for pheromone update and meta-heuristic information measurement.

On comparing Hybrid ACO with AC-ABC Hybrid,

DatasetsTotal Features

SF (Best)

PA (Best) (%)

SF(Average)

PA (Average)(%)

Heart-C 14 6 87.29 7 86.92Dermatology 34 27 98.73 27 98.12Hepatitis 19 11 79.29 11 78.76Lung Cancer 56 27 90.83 27 88.04Lymph 18 11 81.09 11 80.91Pima 8 4 91.92 4 89.89Iris 4 2 96.73 2 95.10Wisconsin 9 3 99.43 3 99.07Diabetes 9 4 84.16 4 83.84Heart-Stalog 13 4 87.82 4 84.51Thyroid 21 9 99.61 9 99.14Sonar 60 13 96.74 13 92.46Gene 120 24 93.06 24 90.88

Page 10: AC-ABC Hybrid Revised2.docx

A bounded scheme is used to determine the size of the feature subset in Hybrid ACO, whereas in the proposed method there is no restriction on the size of the feature subset.

The Hybrid ACO has been executed 20 independent times and the average results are reported. AC-

ABC Hybrid has been executed for 10 independent runs and both the best results and the average results have been presented.

The initial values assigned to ACO control parameters and the iteration threshold varies for both the methods.

Hybrid-ACO [27] has four datasets (Wisconsin Breast Cancer, Thyroid, Sonar, and Gene) that are common with the proposed methodology and the comparison is shown in Table 4A. In terms of Feature Selection, Hybrid ACO is leading having produced feature subset with much reduced size. But accuracy wise AC-ABC Hybrid has superior performance for all the four datasets. The reason for smaller subset size is: in Hybrid-ACO a bound is set on the size of the feature subset to be selected whereas in the proposed method, the feature subset yielding the highest predictive accuracy is selected irrespective of the size. Good prediction is the preferred criterion in the proposed method.

Table 4.Performance Comparison of the proposed AC-ABC Hybrid with other Meta-Heuristic Approaches for Feature Selection and Classification in the Literature

Table 4A. Comparison with Hybrid ACO [27]

Table 4B.

Comparison with IQR Bee [36]

Table 4C. Comparison of the results for the SONAR dataset with other FS methods

Table 4D. Comparison with ACO – NN Hybrid [26]

Datasets TFHybrid ACO [27] AC-ABC HybridSF PA (%) SF PA (%)

Wisconsin 9 3.50 98.91 3 99.43Sonar 60 6.25 86.05 13 96.74Gene 120 7.25 89.20 24 93.06Thyroid 21 3.00 99.08 4 99.61

Datasets TFIQR Bee [36] AC-ABC HybridSF PA (%) SF PA (%)

Heart-C 14 6 86.54 6 87.29Dermatology 34 7 92.36 27 98.73Lung Cancer 56 4 83.03 27 90.83Wisconsin 9 4 88.70 3 99.43

Datasets TFCatFish Binary PSO [31]

PSO_SVM [29] ACO –FS [4] Hybrid ACO [27]

AC-ABC Hybrid

SF PA(%) SF PA (%) SF PA (%) SF PA (%) SF PA (%)Sonar 60 30.2 96.92 34 96.15 43.4 100 6.25 86.05 13 96.74

Datasets TF ACO – NN Hybrid [26] AC-ABC HybridSF PA (%) SF PA (%)

Thyroid 21 14 94.50 4 99.01

Page 11: AC-ABC Hybrid Revised2.docx

Table 4E. Comparison with ACO Based FS [28]

Note: TF represents the total number of features in the dataset SF represents the number of features selected by each method

PA represents the prediction Accuracy achieved for each of the dataset in percentage

7.1.2. Comparative Analysis with IQR Bee [35]

In IQR Bee [35], a new FS algorithm is proposed based on Rough Set Theory hybrid with Weighted Bee Colony Optimization.

IQR Bee has been executed 10 times independently and the best results are reported.

The bee population is set as 10 in IQR Bee, whereas in the proposed method, the population size is set equal to the number of features in each dataset.

The initial values assigned to ABC control parameters vary from the proposed method; however iteration threshold and the number of runs match for both the methods.

The performance comparison with IQR Bee [35] is presented in Table 4B and Heart-C, Dermatology, Lung Cancer, and Wisconsin Breast Cancer are the datasets that are common with our work. The proposed method has given higher accuracies when compared to IQR-Bee, but for Dermatology and Lung Cancer datasets, IQR Bee has resulted in less number of features compared to AC-ABC Hybrid.

7.1.3. Comparison with Catfish Binary PSO [31], PSO_SVM [29], ACO-FS [4], Hybrid ACO [27]

Six methodologies have been experimented on Sonar dataset and among them, ACO-FS has yielded the highest classification performance. This is presented in Table 4C.

In Catfish Binary PSO [31], catfish particles are introduced into the binary search space in addition to the original particles; selected features and the classification accuracy are the best results obtained over the execution of 5 independent runs.

ACO-FS [4]: The experiments were run on a PC with 2 GHz CPU and 2 GB RAM. Similar to the proposed methodology, WEKA tool have been used for classification and population of ants is set equal to the size of the feature set.

7.1.4. Comparison with ACO-NN Hybrid [26]

Datasets TF ACO Based FS[28] AC-ABC HybridSF PA (%) SF PA (%)

Heart-C 14 8 86.85 6 87.29Dermatology 34 28 98.35 27 98.73

Hepatitis 19 13 77.45 11 79.29Lung Cancer 56 33 87.37 27 90.83Lymph 18 16 78.35 11 81.09Pima 8 6 89.82 4 91.92Iris 4 3 93.35 2 96.73Wisconsin 9 4 87.65 3 99.43Diabetes 9 6 84.11 4 84.16Heart-Stalog 13 8 82.12 4 87.82Thyroid 21 11 98.99 9 99.61Sonar 60 17 84.73 13 96.74Gene 120 28 85.30 24 93.06

Page 12: AC-ABC Hybrid Revised2.docx

The proposed work has only one dataset (Thyroid) that match with ACO-NN Hybrid [26]; AC-ABC Hybrid outperforms ACO-NN Hybrid in both FS and prediction and this is shown in Table 4D.

7.1.5. Comparison with ACO Based FS [28]

In our previous work, we have implemented the optimization of feature selection using ACO technique [28]. In ACO, ants select a feature based on the move probability associated with each feature. The move probability depends on the heuristic information of pheromone value associated with each feature and the proportion of ants that have selected particular feature at that instant [4] and [28]. The classification results achieved from ACO-FS [28] are shown in Table 4E for comparison with the proposed method. It can be inferred from Table 4, ABC-Hybrid has resulted in increased classification accuracies compared to the literature works considered [26], [27], [28], [29], [31], and [36]; it has lost in case of Sonar dataset to ACO-FS [4].

7.2. Execution Times of ACO Based FS [28] and AC-ABC Hybrid

The time taken to execute the proposed AC-ABC Hybrid is given in Table 5 in comparison with the time required for execution of our previous work ACO based FS [28]. Both ACO Based FS [28] and AC-ABC Hybrid were executed on the same machine with configurations: Intel Dual Core CPU, 1 GB RAM and Windows XP Operating System. It can be inferred from Table 5, AC-ABC Hybrid requires little execution time to achieve the improved results, but the execution time of AC-ABC Hybrid is more when compared to the time taken to execute ACO based FS. This is because in the present study, each time the ants decide upon a feature subset, the feature subset is passed to the employed and onlooker bees for further exploitation; based on the information returned from the onlooker bees the best feature subset and the best ant of the iteration are decided. However, at the cost of increased computation time, AC-ABC Hybrid has produced better results for both FS and classification compared to ACO based FS and this is shown in Table 4. However, it can be stated that the hybrid algorithm AC-ABC requires very little execution time with greater performance when compared to PSO based Feature Selection [43].

Table 5. Execution Time of the Proposed AC-ABC Hybrid in Comparison with ACO-FS

7.3. Measure of F-Test To validate the statistical significance of the proposed algorithm compared to our previous work ACO based

FS [28], F-Test has been applied to the datasets that resulted from both the algorithms at 95% confidence intervals. Since, some datasets that have been used involve multi-class classification problem, F-Test has been applied to calculate the power level of classification. The power level for each dataset with reduced features resulting from both ACO based FS [28] and AC-ABC Hybrid have been calculated and presented in Table 6. For all the 13 datasets that have been considered, it could be visualized form Table 6 that, the power values are higher for AC-ABC Hybrid compared to ACO based FS. As power increases, classification accuracy also increases which shows the significance of the proposed algorithm.

Dataset ACO Based FS (Seconds) [28]

AC-ABC Hybrid(Seconds)

Heart-C 8.42 10.09Dermatology 12.07 22.33Hepatitis 10.01 19.00Lung Cancer 29.59 42.40Lymph 9.23 14.00Pima 2.00 2.56Iris 1.49 2.04Wisconsin 4.47 7.09Diabetes 4.21 8.33Heart-Stalog 8.59 11.43Thyroid 20.14 26.52Sonar 28.00 32.50Gene 126.31 200.48

Page 13: AC-ABC Hybrid Revised2.docx

Table 6. Measure of Power Level of Classification (F-Test MANOVA)

From the data presented in Tables 3 and 4, it can be inferred that,

i. Feature Selection by the proposed method has definitely improved the accuracy of classification

ii. For all the datasets, AC-ABC has given the highest recognition rates compared to the state of the art methods; Sonar dataset is the exception

iii. For Heart-C, Dermatology and Wisconsin datasets, there is only 1% increase in the recognition rate

iv. In case of Hepatitis dataset, there is 2% increase in the classification accuracy compared to ACO

v. In, Heart-Stalog, Lung Cancer and Lymphography, the features selected are more optimal and has resulted in 5% improvement of classification accuracy

vi. There is 3% increase in the prediction accuracy for Pima and Iris datasets

vii. The performance is marginally increased in case of Diabetes and Thyroid dataset with reduced feature subset size

viii. For gene dataset, the proposed method has improved the recognition rates by 4%

In AC-ABC Hybrid, the reduction of the feature subset size not only speeds up the classification process also reduces the computation time [43]. The search space is reduced easily and efficiently because the redundant and unwanted features are filtered once by the ants, then by the employed bees and the onlooker bees.

8. Conclusion

ACO has been used for optimizing FS in numerous applications and it is also capable of synchronizing well with other optimization techniques. ABC also finds its application in a number of domains where optimization is required. In this paper, a novel hybrid algorithm AC-ABC is proposed and implemented for feature selection and classification. This hybrid algorithm makes use of advantages of both ACO and the ABC algorithm and the results show the promising behavior of the proposed algorithm. The proposed algorithm has resulted in reduced size of the feature subset, increased classification accuracies, low computational complexity and quick convergence.

References

1.   Dorigo, M. and Stutzle, T. (2004) ‘Ant Colony Optimization’ The MIT Press, Massachusetts, USA.

Dataset ACO Based FS [28] AC-ABC Hybrid

Heart-C 0.8770 0.9852Dermatology 0.9186 0.9399Hepatitis 0.8040 0.8736Lung Cancer 0.9018 0.9375Lymph 0.8966 0.9289Pima 0.8999 0.9130Iris 0.9244 0.9316Wisconsin 0.8401 0.9363Diabetes 0.8865 0.8948Heart-S 0.8746 0.9044Thyroid 0.9559 0.9766Sonar 0.8692 0.9461Gene 0.8818 0.9676

Page 14: AC-ABC Hybrid Revised2.docx

2.   Kavita, Chawla, H.S. and Saini, J.S. (2011) ‘Parametric comparison of Ant colony optimization for edge detection problem’, International Journal of Computational Engineering & Management, Vol. 13, pp. 54 – 58.

3.   Zhang, Z. and Yang, P. (2008) ‘An Ensemble of Classifiers with Genetic Algorithm Based Feature Selection’, IEEE Intelligent Informatics Bulletin, Vol. 9, No. 1, pp. 18-24.

4.   Abd-Alsabour, N. and Randall, M. (2010) ‘Feature Selection for Classification Using an Ant Colony System’, Proceedings of the Sixth IEEE International Conference on e–Science Workshops, pp 86- 91.

5.   Zou, W., Zhu, Y., Chen, H. and Zhu, Z. (2010) ‘Cooperative Approaches to Artificial Bee Colony Algorithm’, Proceedings of the IEEE International Conference on Computer Application and System Modeling, Vol. 9, pp. 44-48.

6.   El-Abd, M. (2010) ‘A Cooperative Approach to the Artificial Bee Colony Algorithm’, IEEE,

7.   Bao, L. and Zeng, J. (2009) ‘Comparison and Analysis of the Selection Mechanism in the Artificial Bee Colony Algorithm’, Proceedings of the IEEE Ninth International Conference on Hybrid Intelligent Systems, pp. 411- 416.

8.   Karaboga, D. (2005) ‘An Idea Based on Honey Bee Swarm for Numerical Optimization’, Technical Report-TR06, (Erciyes University, Engineering Faculty, Computer Engineering Department).

9.   Quan, H. and Shi, X. (2008), ‘On the Analysis of Performance of the Improved Artificial-Bee-Colony Algorithm’, Proceedings of the IEEE Fourth International Conference on Natural Computation, pp 654 – 658.

10.   Kang, F., Li, J., Li, H., Ma, Z. and Xu, Q. (2010) ‘An Improved Artificial Bee Colony Algorithm’, IEEE, .

11.   Ahmed, E.F., Yang, W.J. and Abdullah, M.Y. (2009) ‘Novel method of the combination of forecasts based on rough sets’, Journal of Computer Science, Vol. 5, pp. 440-444.

12.   Santana, L.E.A., Canuto, L., Pintro, F. and Vale, K.O. (2010) ‘A Comparative Analysis of Genetic Algorithm and Ant Colony Optimization to Select Attributes for an Heterogeneous Ensemble of Classifiers’, IEEE pp. 465-472.

13.   Yu, S. (2003) ‘Feature Selection and Classifier Ensembles: A Study on Hyperspectral Remote Sensing Data’, A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Physics), The University of Antwerp.

14.   Rezaee, M.R., Goedhart, B., Lelieveldt, B.P.F. and Reiber, J.H.C. (1999) ‘Fuzzy feature selection’, Pattern Recognition, Vol. 32, pp. 2011-2019.

15.   Kuncheva, L.I. (2005) ‘Combining Pattern Classifiers, Methods and Algorithms’ Wiley Inter science.

16.   Duda, R.O., Hart, P.E. and Stork, D.G. (2001) ‘Pattern Recognition’ John Wiley & Sons, Inc 2nd edition.

17.   Molina, L.C., Belanche, L. and Nebot, A. (2002) ‘Feature Selection Algorithms: A Survey and Experimental Evaluation’, Proceedings of the Second IEEE International Conference on Data Mining, pp. 155 - 172.

18.   WEKA: A Java Machine Learning Package, Available at: http://www.cs.waikato.ac.nz/˜ml/weka/.

19.   Karaboga, D. and Akay, B. (2009) ‘A survey: algorithms simulating bee swarm intelligence’, Artificial Intelligence Review, Vol. 31, pp. 61–85.

20.   Wong, L., Chong, C.S., Puan C.Y. and Low, M.Y.H. (2008) ‘Bee Colony Optimization Algorithm With Big Valley Landscape Exploitation for Job Shop Scheduling Problems’, Proceedings of the 2008 Winter Simulation Conference IEEE, pp. 2050-2058.

21.   Frank, A. and Asuncion, A. (2010) UCI Machine Learning Repository, Available at: (http://archive.ics.uci.edu/ml. Irvine, CA: University of California, School of Information and Computer Science )

22.   Polikar R. (2006) ‘Ensemble based Systems in decision making’, IEEE Circuits and Systems Magazine, Vol. 6, No. 3, pp. 21-45.

23.   Freund, Y. and Schapire, R.E. (1986) ‘Experiments with a new boosting algorithm’, Proc. of the Thirteenth International conference on Machine Learning, pp. 148-156.

Page 15: AC-ABC Hybrid Revised2.docx

24.   Kanan, H. and Faez, K. (2008) ‘An improved feature selection method based on ant colony optimization (ACO) evaluated on face recognition system’, Journal of Applied Mathematics and Computation, pp. 716-725.

25.    Robbins, K., Zhang W. and Bertrand, J. (2007) ‘The ant colony algorithm for feature selection in high-dimension gene expression data for disease classification’, Journal of Mathematical Medicine and Biology, Oxford University Press, pp. 413-426.

26.   Sivagaminathan, R.K. and Ramakrishnan, S. (2007) ‘A hybrid approach for feature subset selection using neural networks and ant colony optimization’, Elsevier - Expert Systems with Applications, Vol. 33, pp. 49–60.

27.    Kabir M.M., Shahjahan, M. and Murase, K. (2008) ‘A new hybrid ant colony optimization algorithm for feature selection’, Journal of Applied Soft Computing, Vol. 8, pp. 687–697.

28.   Shunmugapriya, P., Kanmani, S., Devipriya, S., Archana, J. and Pushpa, J. (2012) ‘Investigation on the Effects of ACO Parameters for Feature Selection and Classification’, Proceedings of Springer - The Third International Conference on Advances in Communications, Networks and Computing, LNICST, pp. 136–145.

29.   Tu, C.J., Chuang, L., Chang, J. and Yang, C. (2007) ‘Feature Selection using PSO-SVM’, IAENG International Journal of Computer Science, Vol. 33, No. 1, pp. 18 – 23.

30.   Wei Zhao, Gang Wang, Hong-bin Wang, Hui-ling Chen, Hao Dong and Zheng-dong Zhao, A Novel Framework for Gene Selection, International Journal of Advancements in Computing Technology, 3(3), 2011, 184 – 191.

31.    Chuang, L., Tsai, S. and Yang, C. (2011) ‘Catfish Binary Particle Swarm Optimization for Feature Selection’, Proceedings of the International Conference on Machine Learning and Computing IPCSIT, vol. 3, pp. 40 – 44.

32.   Lin, S. and Chen, S. (2009) ‘PSOLDA: A particle swarm optimization approach for enhancing classification accuracy rate of linear discriminant analysis’, Applied Soft Computing, Vol. 9, pp. 1008–1015.

33.   Karaboga, D. Gorkemli, B., Ozturk, C. and Karaboga, N. (2012) ‘A comprehensive survey: artificial bee colony (ABC) algorithm and applications’, Journal of Artificial Intelligence Reviews (online).

34.   Aghdam, M.H., Ghasem, N., Aghaee, and Basiri, M.E. (2009) ‘Text feature selection using ant colony optimization’, Journal of Expert Systems with Applications, Vol. 36, pp. 6843–6853.

35.   Suguna, N. and Thanushkodi, K.G. (2011) ‘A weighted bee colony optimisation hybrid with rough set reduct algorithm for feature selection in the medical domain’, International Journal of Granular Computing, Rough sets and Intelligent Systems, Vol. 2, No. 2, pp. 123 – 140.

36.   Suguna, N. and Thanushkodi, K.G. (2010) ‘A novel Rough Set Reduct Algorithm for Medical Domain based on Bee Colony Optimization’, Journal of Computing, Vol. 2, No. 6, pp. 49 -54.

37.    Yildiz, A.R. (2012) ‘A new hybrid bee colony optimization approach for global optimization in design and manufacturing’, Applied Soft Computing (Article in Press).

38.   Xu, C., Duan, H. and Liu, F. (2010) ‘Chaotic Artificial Bee Colony Approach to Uninhabited Combat Air Vehicle (UCAV) Path Planning’, Journal of Aerospace Science and Technology, Vol. 14, pp. 535–541.

39.   Magdalene Marinaki ,YannisMarinakis and ConstantinZopounidis, Honey Bees Mating Optimization algorithm for financial classification problems, Applied Soft Computing 10 (2010), 806–812.

40.   Sarkar, B.K. and Chakraborty, S.K. (2011) ‘Classification System using Parallel Genetic Algorithm’, International Journal of Innovative Computing and Applications, Vol. 3, No. 4, pp. 223 – 241.

41.   Chrysostomou, K., Chen, S.Y. and Liu, X. (2008) ‘Combining multiple classifiers for wrapper feature selection’, International Journal of Data Mining, Modeling and Management, Vol. 1, No.1  pp. 91 – 102.

42.   V. Indira, R. Vasanthakumari and V. Sugumaran, Minimum sample size determination of vibration signals in machine learning approach to fault diagnosis using power analysis, Elsevier - Expert Systems with Applications 37 (2010), 8650–8658.

43.   Wang, X. (2007) ‘Feature Selection based on Rough Sets and Particle Swarm Optimization’, Journal of Pattern Recognition Letters, Vol. 28, No. 4. pp. 459-471.

Page 16: AC-ABC Hybrid Revised2.docx