Post on 01-Oct-2021
Hyperbox Classifier with
Ant Colony Optimization
G. N. Ramos*, F. Dong, and K. Hirota
Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology
G3-49, 4259 Nagatsuta, Midori-ku, Yokohama-city 226-8502, Japan
*e-mail: ramos@hrt.dis.titech.ac.jp
Abstract – A method, called HACO2 (Hyperbox
classification with Ant Colony Optimization), is proposed
for evolving a classifier for labeled data using hyperboxes
and an ant colony meta-heuristic. It reshapes the
hyperboxes in a near-optimal way, improving the
accuracy of the results while acknowledging the
topological information (inherently associated to
classification) of the data. It also allows a feature
discriminating ratio to determine which characteristics
are more important for distinguishing classes. The method
is validated using artificial 2D data and then applied to
the benchmark iris database. Both experiments provide
results with over 95% accuracy. Further modifications
(automatic parameter setting) and extensions
(initialization short comes) and applications to the field of
software assessment are discussed.
1. Introduction
Pattern classifiers have an underlying geometry [1] in
the feature space, and it is directly related to its complexity
and, naturally, with its performance when applied. The
tradeoff, as usual, is that a complex, poor performing
classifier should provide more accurate results than a simpler,
and therefore speedier, one. Though there are many methods
to create classifiers, not all consider the topological
information. The simplest option for a classifier is an interval
defining a region. In case of a multi-dimensional space, this is
called a hyperbox [2][3].
A number of hyperboxes can be used as classifier by
adjusting their position and dimensions, which clearly
composes a combinatorial problem. The Ant Colony
Optimization (ACO) is a multi-purpose meta-heuristic that
can solve such combinatorial optimization problems [4]. It
has been successfully applied to many theoretical and
real-world problems.
This paper proposes the Hyperbox classification with
Ant Colony Optimization method (HACO2) to evolve a
classifier. It modifies the shape of hyperboxes, to improve
accuracy while maintaining the desirable features such as
acknowledging the inherent topological information and
providing straightforward interpretation and rule extraction.
The resulting classifier may also provide useful insights on
the importance of features for discrimination, in a simple and
understandable way.
The method is applied to a 2D artificial data set for
validation, improving the classification results and providing
accuracy of over 95%. It is then applied to a benchmark
problem, the classification of the iris data set. In this case, it
also provides result with over 95% accuracy and indicates
which features of the data set have a more discriminative
nature.
A brief description of ACO and definition of hyperboxes
is presented in 2; 3 proposes and details the HACO2 method;
classification experiments and the results are analyzed in 4.
2. Description of Ant Colony Optimization and
Hyperboxes
2.1 A review on Ant Colony Optimization
The ACO meta-heuristic is population-based approach
for discrete combinatorial optimization problems. It is based
in the way real ant colonies work to optimize combinatorial
problems [4][5][6][7]. The basic analogy is the synergy of
applying multiple communicating agents to build a solution to
the problem.
Real ants, when foraging for food, communicate with
each other by depositing pheromone on the trail between the
food source and the nest [4][5][6][7][8][9]. The shorter the
trail, the faster the ants will go through it and thus more
pheromone will be deposited. Since ants have a high
probability of following trails with higher pheromone
deposition, the process reinforces itself.
This is a distinctive feature of ACO: the pheromone
matrix works as dynamic memory, indicating how desirable
an object is to the solution [4][5][6][7]. The values are
updated according to the quality of the solutions, so the
process “remembers” good solutions and “forgets” bad ones.
This resembles an elitist approach of Genetic Algorithms
(GAs), in which the chromosomes are first ordered by their
rank [10].
The main characteristics of ACO are positive feedback
(improves speed of finding good solutions), distributed
computation (avoids early convergence) and greedy heuristic
(finds reasonable solutions early in the process) [4][5][6][7].
Due to such characteristics, however, it may be outperformed
by specialized algorithms [4][7].
An ACO algorithm can be simplified in three basic
procedures per iteration: build solutions, local optimization
(an optional step) and pheromone update [4][5][6][7] as
illustrated in Figure 1. Its inherent characteristics
(flexibility and fast convergence) indicate that the ACO
1714
SA-D5-1 SCIS & ISIS 2008
algorithm can be successfully applied to optimizing the
shape of hyperboxes in a classifier to improve its geometry
and, thus, its classification efficiency.
Figure 1 – ACO algorithm flowchart
2.2 Describing Hyperboxes
A hyperbox defines a region in an n-dimensional space
[2][3][11] and is fully described by two vectors, usually its
two extreme points (the lower and upper bounds). Assuming
an n-dimensional space of real numbers (ℜn) and a hyperbox
Hl = (al, bl), where al ≤ bl, a point y is said to be in Hl if:
{ }
n
lllkklkl
nk
n
l
Cl
babyaHy
yyyyy
H
HHHHH
ℜ∈≤≤⇔∈
=
ℜ⊂
=
,,
},,,,,,{
,,,,,
21
21
LL
LL
, (1)
where C is the number of hyperboxes and yk is the k-th
attribute of y.
Using this definition it is necessary to have two points,
the boundaries, per hyperbox. The objective of HACO2 is,
however, to improve the shape of a hyperbox to better fit a
given class of data, not to shift it in the feature space. This
shifting may happen if the lower boundary is moved and the
upper boundary is not, for example. It is, therefore, more
convenient to use a slightly different approach, which
maintains the useful characteristics of a hyperbox. Therefore,
each hyperbox will define a region in the space around the
data point, such that:
,22
,
},,,,,{
},,,,,{
21
21
kjkk
kjkl
n
nk
n
Nj
Dxy
DxHy
y
DDDDD
xxxxX
+≤≤−⇔∈
ℜ∈
=
ℜ∈=
∀
LL
LL
(2)
where X is the set of data points.
Thus, a hyperbox can be defined by one point (in HACO,
one data object) and an n-dimensional vector defining the
edge lengths for each attribute, as in Hl = (xj, D).
Hyperbox classifiers can give straightforward
interpretation for classification rules [11], such as “if y ∈ [al,
bl] then y belongs to the class defined by Hl”, without
calculating any distances. Also, if associated with a fuzzy
membership function, they can be used as inputs for fuzzy
min-max neural networks to be applied in classification [2] or
clustering [3]. In these applications, data is assumed to be
labeled and part of it is used for training.
3. HACO2: Hyperbox classification with Ant
Colony Optimization
Evolutionary and meta-heuristic methods are globalized
search techniques, while usual deterministic and stochastic
methods are localized ones [4][5][12], i.e. the latter require
good initial conditions to be able the output a good result. It is
computationally prohibitive to search the possible labels for
an optimum value of criteria, however ACO has proved fast
convergence [13][14][15]. The proposed Hyperbox
classification with Ant Colony Optimization (HACO2)
method receives a vector of hyperboxes, which define a
region of the feature space as a specific class, and attempts to
reshape them in order to improve the overall classification
fitness.
A hyperbox classifier provides a straightforward
extraction of classification rules. The simplicity of use and its
potential for improvement (Figure 2) are also important
advantages of using it. In theory, any number of hyperboxes
of any size/shape can be added to improve classification, so
the classifier’s geometry will become as complex as required.
Practical applications, naturally, aim for a reasonable number
to minimize the computational cost.
Figure 2 – Gray area represents the data. (a) Simple hyperbox
classifier, (b) improved classifier with additional hyperboxes.
An interesting characteristic of HACO2 is the influence
of its initialization. The aim of the proposed method is to
improve the classification, so the given hyperboxes should be
properly initialized, i.e., their initial position should be near a
centroid of an agglomeration of data which belongs to one
class. Therefore, a preprocessing step like the Fuzzy C-Means
(FCM) or K-Nearest Neighbors, though not necessary, may
provide a considerably improved classification result.
The initial input includes the ACO parameters, a vector
of hyperboxes whose edges may be extended or reduced in
order to better fit the data, a dimension change ratio cdr ∈ [0,
1] and the maximum number of times it may be used. The
algorithm will run while the following conditions are true:
1715
- the number of iterations is less than a given threshold;
- the ratio of the solution is less than a desired threshold.
The ratio for dimension change indicates how much the
minimum change in an edge’s length is during the execution.
For example, assuming an edge length of 10, if cdr = 0.1 and
the limit for using is 2, then the algorithm may give a solution
for this edge within the set {8, 9, 10, 11, 12}. This means a
10% increase/decrease of the original length applied at most
two times. Smaller changes may give a more accurate result
but can also take much longer to achieve the desired result.
Initialization of the pheromone matrix is done by setting
all values to a small positive value τ0 ∈ ℜ [4][5][6][7], so no
partition is preferred over the others, concluding the
initialization procedures.
HACO2 builds a solution vector Sr in every run for each
of the R agent by applying the information stored in the
pheromone matrix Τ to assign a dimension ratio to each of the
hyperboxes’ edges.
This can be done in two ways. The first process is called
exploitation (probabilistic choice) [4][5][6][7], where the
agent chooses the dimension ratio with highest pheromone
value with a probability Q0 ∈ (0, 1). In case this probability is
not satisfied, the agent chooses one of the possible
dimensions with a stochastic distribution pjl of the pheromone
values, such that:
∑=
=N
j
jl
jl
jlp
1
τ
τ, (3)
where τjl is the element of Τ that associates the
pheromone concentration of the l-th dimension to the j-th
hyperbox. This process is called exploration [4][5][6].
Local optimization is done after the agents have built the
solution. The system tries to optimize the best L solutions by
changing the dimension ratio assigned to each edge with a
local search probability PLS. The modified solution is stored
if it is an improvement on quality.
Updating the pheromone matrix according to the quality
of the solutions is the final, and most important, step in each
iteration. This resembles the elitist selection strategy used in
genetic algorithms [7], where only the best fit individuals in
the population are carried on to the next generation [11]. It
ensures the agents will tend to build new solutions based on
the knowledge gained from previous iterations. Each
pheromone value is updated as:
∑=
+⋅=+L
m
mjljl Saccuracytt1
)()()1( τρτ , (4)
where ρ is the trail persistence that defines how well the
system remembers the knowledge acquired with previous
solutions (1 - ρ is called the evaporation rate).
The quality of the solution defines how the pheromone
matrix will be updated and, therefore, how the system will
search for solutions. In HACO2, the objective is to maximize
its accuracy [1], i.e., the number of data objects of a given
class contained in a hyperbox while minimizing the number
of data objects inside the hyperbox that are not of such class.
Thus, the proposed method aims to maximize the accuracy of
a solution:
,N
TNTPaccuracy
+= (5)
where TP is the number of samples that have been correctly
positively classified, TN is the number of samples that have
been correctly negatively classified and N is the total number
of samples.
Overlapping hyperboxes are, naturally, considered
to represent the same class. Two hyperboxes overlap if:
,,..
,,,
kj
n
kj
HyHytsy
kjHHH
∈∈ℜ∈
≠∈
∃ (6)
where y is any n-dimensional point. It is important to
emphasize that even though two or more hyperboxes may
overlap, according to (6) the data objects may only belong to
one hyperbox when calculation the classification ratio
4. HACO2 Experiments on 2D Artificial and
Iris Data Sets
4.1 Experimental Setup
Experiments are run under Mac OS X 10.5 on a
2.16GHz Intel Core 2 Duo processor with 2GB of RAM,
running a C++ implementation of the proposed Hyperbox
classification with Ant Colony. The goal is to attempt to shape
the classifier to best fit the data, and is this is achieved by
using the ACO to increase/decrease the lengths of the
edges(s) of the hyperbox(es).
The first experiment considers classification into two
classes, one ellipsoid data set composing one class, which is
considered to be surrounded by a second class. The second
experiment considers the classification of the iris database, a
well known benchmark set.
The synthetic data set, illustrated in Figure 3, consists of
500 data points in a two-dimensional feature space, generated
with normal distribution with mean µ = 0 and variance σ2 = 1.
The low feature dimensionality allows easier visualization,
though it must be noted that the proposed method can handle
high dimensionality as well.
Initial positioning for the hyperboxes is obtained with
the Fuzzy C-Means algorithm. Fuzzy clustering extends the
notion of association between objects and clusters using a
membership function that defines with what degree each
object belongs to each cluster [14]. FCM is a well known
clustering method which runs for at most a number I of
iterations or until the difference between the weights in two
consecutive iterations is less than an error threshold ε. The
shape of the fuzzy sets is controlled by a fuzzification factor
m > 1 [11] and initial fuzzy weight values (belonging to [0,
1]) are assigned randomly.
ACO parameters are set empirically with values that are
1716
to known to produce reasonable results [4][5][6][7]. The
number of hyperboxes given to HACO2 is one, two or four in
order to verify their influence in the result. Hyperboxes’
positions are obtained from the FCM and their initial edge
length is half of the feature space span. Other parameters
values input for HACO2 are shown in Table 1.
Table 1 – Algorithm Input Parameter Values
FCM HACO2
I = 1000 I = 1000
ε = 0.0001 R = 20
m = 2 Q0 = 0.98
L = 1
PLS = 0.01
ρ = 0.99
cdr = 0.2
usageNumber = 8
The algorithm was applied in ten runs for statistical
validation and to avoid the local trapping with the random
initialization. The data is divided into two parts, 70% of it is
used in evolving the hyperbox shape and the remaining 30%
is used for testing. The accuracy values are obtained from the
test set and results are illustrated in Table 2 and Figure 3.
Using only one hyperbox, an over 90% accuracy can be
obtained quickly, and though the classifier’s geometry does
provide a reasonable representation of the actual data shape, it
is clear it can be improved by looking at the regions with few
data points in the corners. These regions may account for
false positive classifications.
Two hyperboxes result in a slightly more accurate result,
around 98%, and also provide an improved representation of
the data. In this case, it is worth noting that the regions with
lower agglomeration of data points are also smaller and,
therefore, so is the possibility of false positive classifications.
A higher accuracy, around 99%, is obtained with four
hyperboxes, and an even more complex representation of the
data set’s geometry.
Table 2 – Experimental results for 2D data
Hyperbox(es) Accuracy Average Runtime
[ms]
1 90.8% 34847
2 97.2% 69172
4 99.2% 138704
As expected, increasing the number of hyperboxes used
also improves the representation and, usually, the accuracy.
This, naturally, has the setback of additional computational
cost. The price for higher accuracy using a larger number of
hyperboxes is the computational cost necessary for a
complete execution. Using two hyperboxes, the time required
for obtaining a result is twice as long as the time necessary to
run HACO2 with only one hyperbox. Considering four
hyperboxes, this time is four times longer. A reasonable
tradeoff must be established to use this method to its fullest
potential.
Speed may be improved by changing how the edges’
lengths are changed. The current implementation uses a linear
approach, increasing/decreasing the length with the same step
every time. Instead, an exponential approach could be used. A
assuming an edge length of 10, if cdr = 0.1 and the limit for
using is 2, then an exponential change would provide a
solution for this edge within the set {2.5, 5, 10, 20, 40}.
Another possibility is to add a momentum term to the ratio,
similar to many algorithms such as the backpropagation for
training neural networks [10].
Figure 3 – Results for HACO2 with a) one hyperbox, b) two
hyperboxes and c) four hyperboxes for the 2D data.
The iris database consists of 150 samples with four
features, divided into three classes, iris-setosa, iris-versicolor
and iris-virginica, with 50 samples each. The latter two
overlap and, for the purpose of this experiment, are
considered as one class. The goal is to optimize a single
hyperbox classifier to discriminate the two classes.
As with the 2D data, FCM was applied to initialize
1717
the hyperbox center, and the initial edge length is half of the
feature space span. FCM and HACO2 parameters are the
same as in Table 1.
The algorithm was again applied in ten runs. The data is
divided into two parts, 70% of it is used in evolving the
hyperbox shape and the remaining 30% is used for testing.
The accuracy value for this experiment is 95.3%, showing
that HACO2 can be applied to real world multi-dimensional
data for reasonable results.
Pedrycz et al [11] proposed the following ratio to help
assess the discriminative property of the features based on the
coverage of a hyperbox.
length of interval(upper - lower bound)
length of bound of feature , (7)
This can be applied when using only one hyperbox to
establish which features are more meaningful for
discriminating. The lower the value, the more meaningful it is.
If it approaches 1, the feature extends along most of the
feature space and, thus, is not meaningful. Also, ranking the
features according to this ratio can be reflected in the
application of rules [11], further improving the final
application of the classifier. Table 3 shows the ratio for the
iris database used in the experiment.
Table 3. Iris-setosa discriminative property ratio
Feature Ratio (%)
I 34.72
II 75
III 33.92
IV 100
From the table it is clear that the first and third features
are more discriminative than the other two. Like the principal
component analysis, this approach can be used to simplify
applications by considering only the most discriminative
features, lowering the dimensionality and, thus, the cost.
A reduced dimensionality further may make the
classifier even easier to use. This, along with the use of
hyperboxes, makes the extraction of classifications rules a
simple process. Considering these last results, a simplified
classifier would have just two rules to define whether a given
set of attributes represents an instance of the iris-setosa, on
for each of the most discriminative features.
5. Conclusion
The Hyperbox classification with Ant Colony
Optimization (HACO2) method is proposed for data
classification. It uses the Ant Colony Optimization (ACO) to
evolve the geometry of hyperboxes in the feature space to
better fit the data.
The HACO2 method is validated for data classification
with a computer generated 2D data, and the difference
between in number of hyperboxes in use is investigated. It is
also applied to a benchmark database. Both resulting
classifiers had results with over 95% accuracy. The
experimental results show that HACO2 as a classification
method that considers the shape of the input data, suitable for
pattern recognition applications.
Straightforward and intuitive rules can be extracted
easily, with no need for additional calculation to verify
whether a point belongs to a class or not. In case of using one
hyperbox, meaningful information can be extracted for
feature discriminating purposes, indicating the most important
features and, thus, enabling a further simplification of the
classifier by reducing dimensionality.
Future perspectives include automatic parameter tuning
from the input data set (to minimize user burden), and
overcoming the hyperbox initialization short comes by testing
approaches different from the FCM used in this work.
Application of HACO2 to real-world problems, such as the
Medical Imaging System (MIS) data set for software quality
assessment, may show its versatility as a competitive
classifier evolving method.
References
[1] R.O. Duda, P.E. Hart, and D.G. Stork, “Pattern
Classification”, Wiley-Interscience, 2000.
[2] P. Simpson, “Fuzzy min-max neural networks -- Part 1:
Classification”, IEEE Transactions on Neural Networks, vol.
3, Sep. 1992, pp. 776-786.
[3] P. Simpson, “Fuzzy min-max neural networks -- Part 2:
Clustering”, IEEE Transactions on Fuzzy Systems, vol. 1, Feb.
1993, p. 32 .
[4] M. Dorigo, V. Maniezzo, and A. Corloni, “Ant System”,
EEE Transactions on Systems, Man, and Cybernetics-Part B,
vol. 26, 1996, pp. 29-41.
[5] M. Dorigo and G. Di Caro, “Ant Colony Optimization
Meta-heuristic”, New Ideas in Optimization, 1999, pp. 11-32.
[6] M. Dorigo and T. Stützle, "Ant Colony Optimization, The
MIT Press, 2004.
[7] L.M. Gambardella and M. Dorigo, “An Ant Colony
System Hybridized with a New Local Search for the
Sequential Ordering Problem”, INFORMS Journal on
Computing, vol. 12, 2000, pp. 237-255.
[8] J.L. Deneubourg et al., “The self-organizing exploratory
pattern of the argentine ant”, Journal of Insect Behavior, vol.
3, Mar. 1990, pp. 159-168.
[9] S. Goss et al., “Self-organized shortcuts in the Argentine
ant”, Naturwissenschaften, vol. 76, Dec. 1989, pp. 579-581.
[10] T.M. Mitchell, “Machine Learning”, Mcgraw-Hill, 1997.
[11] W. Pedrycz and S. Giancarlo, “Genetic granular
classifiers in modeling software quality”, Journal of Systems
and Software, vol. 76, 2005, pp. 277-285.
[12] W.J. Gutjahr, “A Graph-based Ant system and its
convergence”, Future Generation Computer Systems, vol. 16,
pp. 873-888.
[13] G. Di Caro and M. Dorigo, “AntNet: Distributed
Stigmergic Control for Communication Networks”, Journal
of Artificial Intelligence Research, vol. 9, 1998, pp. 317-365.
[14] N. Monmarché, M. Slimane, and G. Venturini,
“AntClass: discovery of clusters in numeric data by an
hybridization of an ant colony with the Kmeans algorithm”
1999.
[15] M. Dorigo, M. Birattari, and T. Stützle, “Artificial Ants
as a Computational Intelligence Technique,” IEEE
Computational Intelligence Magazine, 2006.
1718