Hyperbox Classifier with Ant Colony Optimization

Hyperbox Classifier with

Ant Colony Optimization

G. N. Ramos*, F. Dong, and K. Hirota

Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology

G3-49, 4259 Nagatsuta, Midori-ku, Yokohama-city 226-8502, Japan

*e-mail: ramos@hrt.dis.titech.ac.jp

Abstract – A method, called HACO2 (Hyperbox

classification with Ant Colony Optimization), is proposed

for evolving a classifier for labeled data using hyperboxes

and an ant colony meta-heuristic. It reshapes the

hyperboxes in a near-optimal way, improving the

accuracy of the results while acknowledging the

topological information (inherently associated to

classification) of the data. It also allows a feature

discriminating ratio to determine which characteristics

are more important for distinguishing classes. The method

is validated using artificial 2D data and then applied to

the benchmark iris database. Both experiments provide

results with over 95% accuracy. Further modifications

(automatic parameter setting) and extensions

(initialization short comes) and applications to the field of

software assessment are discussed.

1. Introduction

Pattern classifiers have an underlying geometry [1] in

the feature space, and it is directly related to its complexity

and, naturally, with its performance when applied. The

tradeoff, as usual, is that a complex, poor performing

classifier should provide more accurate results than a simpler,

and therefore speedier, one. Though there are many methods

to create classifiers, not all consider the topological

information. The simplest option for a classifier is an interval

defining a region. In case of a multi-dimensional space, this is

called a hyperbox [2][3].

A number of hyperboxes can be used as classifier by

adjusting their position and dimensions, which clearly

composes a combinatorial problem. The Ant Colony

Optimization (ACO) is a multi-purpose meta-heuristic that

can solve such combinatorial optimization problems [4]. It

has been successfully applied to many theoretical and

real-world problems.

This paper proposes the Hyperbox classification with

Ant Colony Optimization method (HACO2) to evolve a

classifier. It modifies the shape of hyperboxes, to improve

accuracy while maintaining the desirable features such as

acknowledging the inherent topological information and

providing straightforward interpretation and rule extraction.

The resulting classifier may also provide useful insights on

the importance of features for discrimination, in a simple and

understandable way.

The method is applied to a 2D artificial data set for

validation, improving the classification results and providing

accuracy of over 95%. It is then applied to a benchmark

problem, the classification of the iris data set. In this case, it

also provides result with over 95% accuracy and indicates

which features of the data set have a more discriminative

nature.

A brief description of ACO and definition of hyperboxes

is presented in 2; 3 proposes and details the HACO2 method;

classification experiments and the results are analyzed in 4.

2. Description of Ant Colony Optimization and

Hyperboxes

2.1 A review on Ant Colony Optimization

The ACO meta-heuristic is population-based approach

for discrete combinatorial optimization problems. It is based

in the way real ant colonies work to optimize combinatorial

problems [4][5][6][7]. The basic analogy is the synergy of

applying multiple communicating agents to build a solution to

the problem.

Real ants, when foraging for food, communicate with

each other by depositing pheromone on the trail between the

food source and the nest [4][5][6][7][8][9]. The shorter the

trail, the faster the ants will go through it and thus more

pheromone will be deposited. Since ants have a high

probability of following trails with higher pheromone

deposition, the process reinforces itself.

This is a distinctive feature of ACO: the pheromone

matrix works as dynamic memory, indicating how desirable

an object is to the solution [4][5][6][7]. The values are

updated according to the quality of the solutions, so the

process “remembers” good solutions and “forgets” bad ones.

This resembles an elitist approach of Genetic Algorithms

(GAs), in which the chromosomes are first ordered by their

rank [10].

The main characteristics of ACO are positive feedback

(improves speed of finding good solutions), distributed

computation (avoids early convergence) and greedy heuristic

(finds reasonable solutions early in the process) [4][5][6][7].

Due to such characteristics, however, it may be outperformed

by specialized algorithms [4][7].

An ACO algorithm can be simplified in three basic

procedures per iteration: build solutions, local optimization

(an optional step) and pheromone update [4][5][6][7] as

illustrated in Figure 1. Its inherent characteristics

(flexibility and fast convergence) indicate that the ACO

SA-D5-1 SCIS & ISIS 2008

algorithm can be successfully applied to optimizing the

shape of hyperboxes in a classifier to improve its geometry

and, thus, its classification efficiency.

Figure 1 – ACO algorithm flowchart

2.2 Describing Hyperboxes

A hyperbox defines a region in an n-dimensional space

[2][3][11] and is fully described by two vectors, usually its

two extreme points (the lower and upper bounds). Assuming

an n-dimensional space of real numbers (ℜn) and a hyperbox

Hl = (al, bl), where al ≤ bl, a point y is said to be in Hl if:

lllkklkl

babyaHy

ℜ∈≤≤⇔∈

ℜ⊂

},,,,,，{

where C is the number of hyperboxes and yk is the k-th

attribute of y.

Using this definition it is necessary to have two points,

the boundaries, per hyperbox. The objective of HACO2 is,

however, to improve the shape of a hyperbox to better fit a

given class of data, not to shift it in the feature space. This

shifting may happen if the lower boundary is moved and the

upper boundary is not, for example. It is, therefore, more

convenient to use a slightly different approach, which

maintains the useful characteristics of a hyperbox. Therefore,

each hyperbox will define a region in the space around the

data point, such that:

},,,,，{

},,,,,{

+≤≤−⇔∈

ℜ∈

ℜ∈=

where X is the set of data points.

Thus, a hyperbox can be defined by one point (in HACO,

one data object) and an n-dimensional vector defining the

edge lengths for each attribute, as in Hl = (xj, D).

Hyperbox classifiers can give straightforward

interpretation for classification rules [11], such as “if y ∈ [al,

bl] then y belongs to the class defined by Hl”, without

calculating any distances. Also, if associated with a fuzzy

membership function, they can be used as inputs for fuzzy

min-max neural networks to be applied in classification [2] or

clustering [3]. In these applications, data is assumed to be

labeled and part of it is used for training.

3. HACO2: Hyperbox classification with Ant

Colony Optimization

Evolutionary and meta-heuristic methods are globalized

search techniques, while usual deterministic and stochastic

methods are localized ones [4][5][12], i.e. the latter require

good initial conditions to be able the output a good result. It is

computationally prohibitive to search the possible labels for

an optimum value of criteria, however ACO has proved fast

convergence [13][14][15]. The proposed Hyperbox

classification with Ant Colony Optimization (HACO2)

method receives a vector of hyperboxes, which define a

region of the feature space as a specific class, and attempts to

reshape them in order to improve the overall classification

fitness.

A hyperbox classifier provides a straightforward

extraction of classification rules. The simplicity of use and its

potential for improvement (Figure 2) are also important

advantages of using it. In theory, any number of hyperboxes

of any size/shape can be added to improve classification, so

the classifier’s geometry will become as complex as required.

Practical applications, naturally, aim for a reasonable number

to minimize the computational cost.

Figure 2 – Gray area represents the data. (a) Simple hyperbox

classifier, (b) improved classifier with additional hyperboxes.

An interesting characteristic of HACO2 is the influence

of its initialization. The aim of the proposed method is to

improve the classification, so the given hyperboxes should be

properly initialized, i.e., their initial position should be near a

centroid of an agglomeration of data which belongs to one

class. Therefore, a preprocessing step like the Fuzzy C-Means

(FCM) or K-Nearest Neighbors, though not necessary, may

provide a considerably improved classification result.

The initial input includes the ACO parameters, a vector

of hyperboxes whose edges may be extended or reduced in

order to better fit the data, a dimension change ratio cdr ∈ [0,

1] and the maximum number of times it may be used. The

algorithm will run while the following conditions are true:

- the number of iterations is less than a given threshold;

- the ratio of the solution is less than a desired threshold.

The ratio for dimension change indicates how much the

minimum change in an edge’s length is during the execution.

For example, assuming an edge length of 10, if cdr = 0.1 and

the limit for using is 2, then the algorithm may give a solution

for this edge within the set {8, 9, 10, 11, 12}. This means a

10% increase/decrease of the original length applied at most

two times. Smaller changes may give a more accurate result

but can also take much longer to achieve the desired result.

Initialization of the pheromone matrix is done by setting

all values to a small positive value τ0 ∈ ℜ [4][5][6][7], so no

partition is preferred over the others, concluding the

initialization procedures.

HACO2 builds a solution vector Sr in every run for each

of the R agent by applying the information stored in the

pheromone matrix Τ to assign a dimension ratio to each of the

hyperboxes’ edges.

This can be done in two ways. The first process is called

exploitation (probabilistic choice) [4][5][6][7], where the

agent chooses the dimension ratio with highest pheromone

value with a probability Q0 ∈ (0, 1). In case this probability is

not satisfied, the agent chooses one of the possible

dimensions with a stochastic distribution pjl of the pheromone

values, such that:

τ, (3)

where τjl is the element of Τ that associates the

pheromone concentration of the l-th dimension to the j-th

hyperbox. This process is called exploration [4][5][6].

Local optimization is done after the agents have built the

solution. The system tries to optimize the best L solutions by

changing the dimension ratio assigned to each edge with a

local search probability PLS. The modified solution is stored

if it is an improvement on quality.

Updating the pheromone matrix according to the quality

of the solutions is the final, and most important, step in each

iteration. This resembles the elitist selection strategy used in

genetic algorithms [7], where only the best fit individuals in

the population are carried on to the next generation [11]. It

ensures the agents will tend to build new solutions based on

the knowledge gained from previous iterations. Each

pheromone value is updated as:

+⋅=+L

mjljl Saccuracytt1

)()()1( τρτ , (4)

where ρ is the trail persistence that defines how well the

system remembers the knowledge acquired with previous

solutions (1 - ρ is called the evaporation rate).

The quality of the solution defines how the pheromone

matrix will be updated and, therefore, how the system will

search for solutions. In HACO2, the objective is to maximize

its accuracy [1], i.e., the number of data objects of a given

class contained in a hyperbox while minimizing the number

of data objects inside the hyperbox that are not of such class.

Thus, the proposed method aims to maximize the accuracy of

a solution:

TNTPaccuracy

+= (5)

where TP is the number of samples that have been correctly

positively classified, TN is the number of samples that have

been correctly negatively classified and N is the total number

of samples.

Overlapping hyperboxes are, naturally, considered

to represent the same class. Two hyperboxes overlap if:

HyHytsy

∈∈ℜ∈

≠∈

∃ (6)

where y is any n-dimensional point. It is important to

emphasize that even though two or more hyperboxes may

overlap, according to (6) the data objects may only belong to

one hyperbox when calculation the classification ratio

4. HACO2 Experiments on 2D Artificial and

Iris Data Sets

4.1 Experimental Setup

Experiments are run under Mac OS X 10.5 on a

2.16GHz Intel Core 2 Duo processor with 2GB of RAM,

running a C++ implementation of the proposed Hyperbox

classification with Ant Colony. The goal is to attempt to shape

the classifier to best fit the data, and is this is achieved by

using the ACO to increase/decrease the lengths of the

edges(s) of the hyperbox(es).

The first experiment considers classification into two

classes, one ellipsoid data set composing one class, which is

considered to be surrounded by a second class. The second

experiment considers the classification of the iris database, a

well known benchmark set.

The synthetic data set, illustrated in Figure 3, consists of

500 data points in a two-dimensional feature space, generated

with normal distribution with mean µ = 0 and variance σ2 = 1.

The low feature dimensionality allows easier visualization,

though it must be noted that the proposed method can handle

high dimensionality as well.

Initial positioning for the hyperboxes is obtained with

the Fuzzy C-Means algorithm. Fuzzy clustering extends the

notion of association between objects and clusters using a

membership function that defines with what degree each

object belongs to each cluster [14]. FCM is a well known

clustering method which runs for at most a number I of

iterations or until the difference between the weights in two

consecutive iterations is less than an error threshold ε. The

shape of the fuzzy sets is controlled by a fuzzification factor

m > 1 [11] and initial fuzzy weight values (belonging to [0,

1]) are assigned randomly.

ACO parameters are set empirically with values that are

to known to produce reasonable results [4][5][6][7]. The

number of hyperboxes given to HACO2 is one, two or four in

order to verify their influence in the result. Hyperboxes’

positions are obtained from the FCM and their initial edge

length is half of the feature space span. Other parameters

values input for HACO2 are shown in Table 1.

Table 1 – Algorithm Input Parameter Values

FCM HACO2

I = 1000 I = 1000

ε = 0.0001 R = 20

m = 2 Q0 = 0.98

PLS = 0.01

ρ = 0.99

cdr = 0.2

usageNumber = 8

The algorithm was applied in ten runs for statistical

validation and to avoid the local trapping with the random

initialization. The data is divided into two parts, 70% of it is

used in evolving the hyperbox shape and the remaining 30%

is used for testing. The accuracy values are obtained from the

test set and results are illustrated in Table 2 and Figure 3.

Using only one hyperbox, an over 90% accuracy can be

obtained quickly, and though the classifier’s geometry does

provide a reasonable representation of the actual data shape, it

is clear it can be improved by looking at the regions with few

data points in the corners. These regions may account for

false positive classifications.

Two hyperboxes result in a slightly more accurate result,

around 98%, and also provide an improved representation of

the data. In this case, it is worth noting that the regions with

lower agglomeration of data points are also smaller and,

therefore, so is the possibility of false positive classifications.

A higher accuracy, around 99%, is obtained with four

hyperboxes, and an even more complex representation of the

data set’s geometry.

Table 2 – Experimental results for 2D data

Hyperbox(es) Accuracy Average Runtime

1 90.8% 34847

2 97.2% 69172

4 99.2% 138704

As expected, increasing the number of hyperboxes used

also improves the representation and, usually, the accuracy.

This, naturally, has the setback of additional computational

cost. The price for higher accuracy using a larger number of

hyperboxes is the computational cost necessary for a

complete execution. Using two hyperboxes, the time required

for obtaining a result is twice as long as the time necessary to

run HACO2 with only one hyperbox. Considering four

hyperboxes, this time is four times longer. A reasonable

tradeoff must be established to use this method to its fullest

potential.

Speed may be improved by changing how the edges’

lengths are changed. The current implementation uses a linear

approach, increasing/decreasing the length with the same step

every time. Instead, an exponential approach could be used. A

assuming an edge length of 10, if cdr = 0.1 and the limit for

using is 2, then an exponential change would provide a

solution for this edge within the set {2.5, 5, 10, 20, 40}.

Another possibility is to add a momentum term to the ratio,

similar to many algorithms such as the backpropagation for

training neural networks [10].

Figure 3 – Results for HACO2 with a) one hyperbox, b) two

hyperboxes and c) four hyperboxes for the 2D data.

The iris database consists of 150 samples with four

features, divided into three classes, iris-setosa, iris-versicolor

and iris-virginica, with 50 samples each. The latter two

overlap and, for the purpose of this experiment, are

considered as one class. The goal is to optimize a single

hyperbox classifier to discriminate the two classes.

As with the 2D data, FCM was applied to initialize

the hyperbox center, and the initial edge length is half of the

feature space span. FCM and HACO2 parameters are the

same as in Table 1.

The algorithm was again applied in ten runs. The data is

divided into two parts, 70% of it is used in evolving the

hyperbox shape and the remaining 30% is used for testing.

The accuracy value for this experiment is 95.3%, showing

that HACO2 can be applied to real world multi-dimensional

data for reasonable results.

Pedrycz et al [11] proposed the following ratio to help

assess the discriminative property of the features based on the

coverage of a hyperbox.

length of interval(upper - lower bound)

length of bound of feature , (7)

This can be applied when using only one hyperbox to

establish which features are more meaningful for

discriminating. The lower the value, the more meaningful it is.

If it approaches 1, the feature extends along most of the

feature space and, thus, is not meaningful. Also, ranking the

features according to this ratio can be reflected in the

application of rules [11], further improving the final

application of the classifier. Table 3 shows the ratio for the

iris database used in the experiment.

Table 3. Iris-setosa discriminative property ratio

Feature Ratio (%)

I 34.72

III 33.92

IV 100

From the table it is clear that the first and third features

are more discriminative than the other two. Like the principal

component analysis, this approach can be used to simplify

applications by considering only the most discriminative

features, lowering the dimensionality and, thus, the cost.

A reduced dimensionality further may make the

classifier even easier to use. This, along with the use of

hyperboxes, makes the extraction of classifications rules a

simple process. Considering these last results, a simplified

classifier would have just two rules to define whether a given

set of attributes represents an instance of the iris-setosa, on

for each of the most discriminative features.

5. Conclusion

The Hyperbox classification with Ant Colony

Optimization (HACO2) method is proposed for data

classification. It uses the Ant Colony Optimization (ACO) to

evolve the geometry of hyperboxes in the feature space to

better fit the data.

The HACO2 method is validated for data classification

with a computer generated 2D data, and the difference

between in number of hyperboxes in use is investigated. It is

also applied to a benchmark database. Both resulting

classifiers had results with over 95% accuracy. The

experimental results show that HACO2 as a classification

method that considers the shape of the input data, suitable for

pattern recognition applications.

Straightforward and intuitive rules can be extracted

easily, with no need for additional calculation to verify

whether a point belongs to a class or not. In case of using one

hyperbox, meaningful information can be extracted for

feature discriminating purposes, indicating the most important

features and, thus, enabling a further simplification of the

classifier by reducing dimensionality.

Future perspectives include automatic parameter tuning

from the input data set (to minimize user burden), and

overcoming the hyperbox initialization short comes by testing

approaches different from the FCM used in this work.

Application of HACO2 to real-world problems, such as the

Medical Imaging System (MIS) data set for software quality

assessment, may show its versatility as a competitive

classifier evolving method.

References

[1] R.O. Duda, P.E. Hart, and D.G. Stork, “Pattern

Classification”, Wiley-Interscience, 2000.

[2] P. Simpson, “Fuzzy min-max neural networks -- Part 1:

Classification”, IEEE Transactions on Neural Networks, vol.

3, Sep. 1992, pp. 776-786.

[3] P. Simpson, “Fuzzy min-max neural networks -- Part 2:

Clustering”, IEEE Transactions on Fuzzy Systems, vol. 1, Feb.

1993, p. 32 .

[4] M. Dorigo, V. Maniezzo, and A. Corloni, “Ant System”,

EEE Transactions on Systems, Man, and Cybernetics-Part B,

vol. 26, 1996, pp. 29-41.

[5] M. Dorigo and G. Di Caro, “Ant Colony Optimization

Meta-heuristic”, New Ideas in Optimization, 1999, pp. 11-32.

[6] M. Dorigo and T. Stützle, "Ant Colony Optimization, The

MIT Press, 2004.

[7] L.M. Gambardella and M. Dorigo, “An Ant Colony

System Hybridized with a New Local Search for the

Sequential Ordering Problem”, INFORMS Journal on

Computing, vol. 12, 2000, pp. 237-255.

[8] J.L. Deneubourg et al., “The self-organizing exploratory

pattern of the argentine ant”, Journal of Insect Behavior, vol.

3, Mar. 1990, pp. 159-168.

[9] S. Goss et al., “Self-organized shortcuts in the Argentine

ant”, Naturwissenschaften, vol. 76, Dec. 1989, pp. 579-581.

[10] T.M. Mitchell, “Machine Learning”, Mcgraw-Hill, 1997.

[11] W. Pedrycz and S. Giancarlo, “Genetic granular

classifiers in modeling software quality”, Journal of Systems

and Software, vol. 76, 2005, pp. 277-285.

[12] W.J. Gutjahr, “A Graph-based Ant system and its

convergence”, Future Generation Computer Systems, vol. 16,

pp. 873-888.

[13] G. Di Caro and M. Dorigo, “AntNet: Distributed

Stigmergic Control for Communication Networks”, Journal

of Artificial Intelligence Research, vol. 9, 1998, pp. 317-365.

[14] N. Monmarché, M. Slimane, and G. Venturini,

“AntClass: discovery of clusters in numeric data by an

hybridization of an ant colony with the Kmeans algorithm”

[15] M. Dorigo, M. Birattari, and T. Stützle, “Artificial Ants

as a Computational Intelligence Technique,” IEEE

Computational Intelligence Magazine, 2006.

Hyperbox Classifier with Ant Colony Optimization

Documents

Transcript of Hyperbox Classifier with Ant Colony Optimization

Ant Colony Optimization Evolutionary Algorithmshowe/cs540/lectures/GAs13a.pdfAnt Colony Optimization ! Evolutionary Algorithms ! ... Genetic Programming Ant Colony Optimization ...

Reservoir Operation by Ant Colony Optimization Algorithms · Ant Colony Behavior Ant colony algorithms have been founded on the observation of real ant colonies. By living in colonies,

Ant Colony Optimization Algorithms

Ant Colony Optimization. Summer 2010: Dr. M. Ameer Ali Ant Colony Optimization.

Implementation and Applications of Ant Colony Algorithms ...code.ulb.ac.be/dbfiles/Dar2003master.pdf · Implementation and Applications of Ant Colony Algorithms ... Ant Colony Optimization

The Working Principle of Ant Colony Optimization - …msang/Algorithm_AntColony.pdf · The Working Principle of Ant Colony Optimization May 29, 2013 The Ant Colony Optimization Algorithm

Mravlje kolonije (Ant Colony Optimization - ACO)primerTSP.pdfMravlje kolonije (Ant Colony Optimization - ACO) + Primer ACO implementacije za problem trgovačkog putnika Ant Colony

Ant Colony Optimization 04 - Iran University of Science ...webpages.iust.ac.ir/yaghini/Courses/AOR_872/Ant Colony Optimization_04.pdf · Ant Colony Optimization: Part 4 Ant’s memory

Ant Colony

Final Ant Colony

Adaptive Ant Colony Algorithm

Ant Colony System: A Cooperative Learning Approach …lombardy/ens/JavaTTT0708/fourmis.pdf · Dorigo and Gambardella - Ant Colony System 1 Ant Colony System: A Cooperative Learning

Mobile Sink and Ant Colony Optimization based Energy ... · WSN-wireless sensor network, ACO-ant colony optimization, ERA-energy aware routing algorithm. ... Ant colony optimization

5. Ant Colony Optimization - unibo.it · PDF file5. Ant Colony Optimization Vittorio Maniezzo, Luca Maria Gambardella, Fabio de Luigi 5.1 Introduction Ant Colony Optimization (ACO)

Ant Colony System

Ant Colony -TSP - Optimization

Ant Colony Optimization: A Component-Wise Overview … · Ant colony optimization • Automatic conﬁguration • Combinatorial optimiza-tion • Metaheuristics Introduction Ant

Ant Colony Wsn

Niching Ant Colony Optimisation

1/27 Ant Colony Optimization for Model Checking · Ant Colony Optimization for Model Checking Enrique Alba and Francisco Chicano Introduction. Background. Ant Colony Optimization.