AMPSO: A New Particle Swarm Method For Nearest Neighborhood Classification
AMPSO: A New Particle Swarm Method For Nearest Neighborhood Classification
5, OCTOBER 2009
Abstract—Nearest prototype methods can be quite successful system can use a example set of data (training data) to “learn”
on many pattern classification problems. In these methods, a how to perform its task, we talk about supervised learning. The
collection of prototypes has to be found that accurately represents classifier must be able to “generalize” from the regularities ex-
the input patterns. The classifier then assigns classes based on the
nearest prototype in this collection. In this paper, we first use the tracted from data already known and assign the correct classes
standard particle swarm optimizer (PSO) algorithm to find those to new data introduced in the system in the future.
prototypes. Second, we present a new algorithm, called adaptive A more specific field in classification is nearest neighbor
Michigan PSO (AMPSO) in order to reduce the dimension of the (NN or 1-NN) classification. NN is a “lazy” learning method
search space and provide more flexibility than the former in this because training data is not preprocessed in any way. The class
application. AMPSO is based on a different approach to particle
swarms as each particle in the swarm represents a single prototype assigned to a pattern is the class of the nearest pattern known
in the solution. The swarm does not converge to a single solution; to the system, measured in terms of a distance defined on the
instead, each particle is a local classifier, and the whole swarm feature (attribute) space. On this space, each pattern defines
is taken as the solution to the problem. It uses modified PSO a region (called its Voronoi region). When distance is the
equations with both particle competition and cooperation and a classical Euclidean distance, Voronoi regions are delimited by
dynamic neighborhood. As an additional feature, in AMPSO, the
number of prototypes represented in the swarm is able to adapt to linear borders. To improve over 1-NN classification, more than
the problem, increasing as needed the number of prototypes and one neighbor may be used to determine the class of a pattern
classes of the prototypes that make the solution to the problem. We (K-NN) or distances other than the Euclidean may be used.
compared the results of the standard PSO and AMPSO in several A further refinement in NN classification is replacing the
benchmark problems from the University of California, Irvine, original training data by a set of prototypes that correctly
data sets and find that AMPSO always found a better solution than
the standard PSO. We also found that it was able to improve the “represent” it. Once this is done, the resulting classifier assigns
results of the Nearest Neighbor classifiers, and it is also competitive classes by calculating distances to the prototypes, not to the
with some of the algorithms most commonly used for classification. original training data, which is discarded. This means that
Index Terms—Data mining, Nearest Neighbor (NN), particle classification of new patterns is performed much faster, as the
swarm, pattern classification, swarm intelligence. number of prototypes is much less than the total number of
patterns. Besides reducing the complexity of the solution (mea-
sured by the number of prototypes), these “Nearest Prototype”
I. I NTRODUCTION
algorithms are able to improve the accuracy of the solution of
Authorized licensed use limited to: Abdul Rauf Baig. Downloaded on December 10, 2009 at 10:51 from IEEE Xplore. Restrictions apply.
CERVANTES et al.: AMPSO: NEW PARTICLE SWARM METHOD FOR NEAREST NEIGHBORHOOD CLASSIFICATION 1083
TABLE I
ENCODING OF A SET OF PROTOTYPES IN A PARTICLE FOR THE PITTSBURGH PSO
In the context of NN classification with PSO, in [11], the In this paper, we were interested in testing MPSO/AMPSO
swarm is used to determine the optimum position for the cen- against algorithms of the same family; that is, prototype-based
troids of data clusters that are then assigned the centroid class. algorithms. Among these algorithms, we have selected 1-NN
This paper presents two approaches to solve the problem of and 3-NN, LVQ, and Evolutionary Nearest Prototype Classifier
prototype placement for nearest prototype classifiers. (ENPC) [4] for comparison.
1) In a standard approach of PSO, a potential solution is For further reference on MPSO/AMPSO properties, we com-
encoded in each particle. The information that has to pare MPSO/AMPSO with classification algorithms of different
be encoded is the set of prototypes and the prototypes’ approaches: J48 (tree algorithm, implementation of C4.5, based
classes. This approach is tested in this paper and used as in trees), PART (rule-based), Naive Bayes, Support Vector
reference for the new method proposed later. Machine (SVM), and Radial Basis Function Neural Network
2) The second method, called Michigan PSO (MPSO), is (RBFNN) [17] classifiers, which are successfully applied to
still related to the PSO paradigm but uses a Michigan several classification problems in [18].
approach; this term is borrowed from the area of genetic Finally, we also include some algorithms that use an evolu-
classifier systems [12], [13]. To be consistent with the de- tionary approach to extract classification rules, such as GAssist
nominations used in that area, the standard PSO is called [19] and Fuzzy Rule Learning Algorithm [20].
“Pittsburgh PSO.” In the Michigan approach, a member This paper is organized as follows: Section II shows how the
of the population does not encode the whole solution to problem is stated in terms of a Pittsburgh PSO; Section III de-
the problem, but only part of it. The whole swarm is scribes the MPSO and AMPSO, including encoding of particles
the potential solution to the problem. To implement this and equations; Section IV describes the experimental setting
behavior, movement and neighborhood rules of the stan- and results of experimentation; finally, Section V discusses our
dard PSO are changed. In previous work [14], the authors conclusions and future work related to this paper.
compared both approaches (Pittsburgh and Michigan)
applied to the rule-discovery binary PSO algorithm. The
adaptive MPSO (AMPSO) method proposed in this paper II. P ITTSBURGH A PPROACH FOR THE N EAREST
is based on the ideas found in [15]. This paper deals with P ROTOTYPE C LASSIFIER
some problems found in the previous work, including a A. Solution Encoding
new version of the algorithm with population adaptation,
and compares the results with the Pittsburgh approach. The PSO algorithm uses a population of particles whose
positions encode a complete solution to an optimization prob-
The advantages of the Michigan approach versus the conven- lem. The position of each particle in the search space changes
tional PSO approach are the following: 1) reduced dimension depending on the particle’s fitness and the fitness of its
of the search space, as particles encode a single prototype and neighbors.
2) flexible number of prototypes in the solution. Data to be classified are a set of patterns, defined by con-
Moreover, a refinement of MPSO, called AMPSO, is pro- tinuous attributes, and the corresponding class, defined by a
posed. This version does not use a fixed population of particles; scalar value. Depending on the problem, attributes may take
given certain conditions, we allow particles to reproduce to values in different ranges; however, before classification, we
adapt to a situation where a particle of a single class “detects” shall scale all the attributes to the [0, 1] range. We are aware
training patterns of different classes in its Voronoi region. that this process may have an effect on the classifier accuracy,
The way MPSO/AMPSO performs classification may be re- so the scaled data sets are the ones used to perform all the
lated to some standard clustering algorithms like Learning Vec- experiments.
tor Quantization (LVQ) [16] which also search for prototypes A prototype is analogous to a pattern, so it is defined by a set
that represent known data. However, the way these prototypes of continuous values for the attributes, and a class. As a particle
are found in MPSO/AMPSO is different in the following ways. encodes a full solution to the problem, we encode a set of
1) Particles use both information from the training patterns prototypes in each particle. Prototypes are encoded sequentially
and information from the neighbor particles to affect their in the particle, and a separate array determines the class of each
movement. In LVQ and other methods, prototypes are prototype. This second array does not evolve, so the class for
moved depending only on the position of patterns. each prototype is defined by its position inside the particle.
2) Particles use attraction and repulsion rules that include Table I describes the structure of a single particle that can
inertia, i.e., velocity retained from previous iterations. hold N prototypes per class, with D attributes and K classes.
3) We use particle memory (local best position). Even in For each prototype, classes are encoded as numbers from 0 to
the absence of outside influence, particles perform a local K − 1, and the sequence is repeated until prototype N · K. The
search around their previous best positions. total dimension of the particle is N · D · K.
Authorized licensed use limited to: Abdul Rauf Baig. Downloaded on December 10, 2009 at 10:51 from IEEE Xplore. Restrictions apply.
1084 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 5, OCTOBER 2009
Good Classifications
Pittsburgh Fitness = · 100. (1)
Number of patterns
Authorized licensed use limited to: Abdul Rauf Baig. Downloaded on December 10, 2009 at 10:51 from IEEE Xplore. Restrictions apply.
CERVANTES et al.: AMPSO: NEW PARTICLE SWARM METHOD FOR NEAREST NEIGHBORHOOD CLASSIFICATION 1085
Authorized licensed use limited to: Abdul Rauf Baig. Downloaded on December 10, 2009 at 10:51 from IEEE Xplore. Restrictions apply.
1086 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 5, OCTOBER 2009
D. Local Fitness Function interact. This means that neighborhood is calculated dynami-
cally using the positions of the particles at each iteration.
In the Michigan approach, each particle has a Local Fitness
Two different neighborhoods are defined for each particle at
value that measures its performance as a local classifier. This is
each iteration of the algorithm.
the fitness value that is used during the evolution of the swarm
to record the best position of the particle. 1) For each particle of class Ci , noncompeting particles are
For this purpose, the algorithm determines the set of patterns all the particles of classes Cj = Ci that currently classify
to which the particle is the closest in the swarm. We assign the at least one pattern of class Ci .
class of the particle to those patterns and determine whether 2) For each particle of class Ci , competing particles are all
the class matches the expected class for the pattern (“good the particles of class Ci that currently classify at least one
classification”) or not (“bad classification”). Then, we calculate pattern of that class (Ci ).
two factors using When the movement for each particle is calculated, that
particle is both:
1
Gf = (6) 1) Attracted by the closest (in terms of Euclidean distance)
dij + 1.0
j∈{g} noncompeting particle in the swarm, which becomes the
1 “attraction center” (ai ) for the movement. In this way,
Bf = (7)
dij + 1.0 noncompeting particles guide the search for patterns of a
j∈{b}
different class.
where 2) Repelled by the closest competing particle in the swarm,
{g} patterns correctly classified by particle i; which becomes the “repulsion center” (ri ) for the move-
{b} patterns incorrectly classified by the particle i; ment. In this way, competing particles retain diversity
dij distance between particle i and pattern j. and push each other to find new patterns of their class
In both (6) and (7), we include the distance to the prototypes, in different areas of the search space.
so closer patterns have greater influence in the calculation of Other authors have already used the idea of repulsion in
those factors. PSO in different ways. For instance, in [22], repulsion is used
Then, we use (8) to obtain the Local Fitness value for the to avoid a complete convergence of the swarm, and, in [23],
particle. In this formula, Total is the number of patterns in the increase population diversity in the standard PSO. This allows
training set the swarm to dynamically adapt to changes in the objective
⎧ Gf function.
⎪
⎪ + 2.0, if {g} = ∅
⎨ Total
and {b} = ∅
Local Fitness = Gf −Bf (8)
⎪
⎪ + 1.0, if {b} = ∅ F. Social Adaptability Factor
⎩ Gf +Bf
0, if {g} = {b} = ∅. The social part of the algorithm (influence from neighbors)
This fitness function gives higher values (greater than +2.0) determines that particles are constantly moving toward their
to the particles that have only “good classifications,” and as- noncompeting neighbor and far from their competing neighbor.
signs values in the range [0.0, +2.0] to particles that classify However, particles that are already located in the proximity of a
any pattern of a wrong class. good position for a prototype should rather try to improve their
In the lowest range, the particles only take into account position and should possibly avoid the influence of neighbors.
local information (the proportion of good to bad classifications To implement this effect, we have generalized the influence
made by itself). In the highest range, the particle fitness uses of fitness in the sociality terms by introducing a new term in the
some global information (the total number of patterns to be MPSO equations, called “Social Adaptability Factor” (Sf ), that
classified), to be able to rank the fitness of particles with a 100% depends inversely on the “Best Local Fitness” of the particle. In
accuracy (particles for which {b} = ∅). particular, we have chosen plainly the expression in
Note that this function is not used to evaluate the whole Sfi = 1/(Best Local Fitnessi + 1.0). (9)
swarm; instead, whenever a Michigan swarm has to be eval-
uated, we calculate the success rate (1), like in the Pittsburgh
PSO, but using each particle as a prototype. There are more G. Adaptive Population
sophisticated functions in the literature that may be used to
evaluate local fitness, to take into account the actual distribu- With a fixed population of particles, the MPSO is limited
tion of classes. Experimentation with other fitness functions is in terms of representation of solutions. It can only find a
desirable and will be subject for future work. solution with a maximum number of prototypes. To prevent
this limitation, an improved version of MPSO is developed,
called AMPSO. AMPSO adjusts the number of particles and
E. Neighborhood for the MPSO
their classes to fit the particular problem.
One of the basic features of the Michigan swarm is that In AMPSO, each particle that classifies a set of patterns of
particles do not converge to a single point in the search space. several classes has a probability to give birth to one particle for
To ensure this behavior, interaction among particles is local, each of classes in that set. This feature must be used with cau-
i.e., only particles that are close in the search space are able to tion; there is a risk of a population explosion if the reproduction
Authorized licensed use limited to: Abdul Rauf Baig. Downloaded on December 10, 2009 at 10:51 from IEEE Xplore. Restrictions apply.
CERVANTES et al.: AMPSO: NEW PARTICLE SWARM METHOD FOR NEAREST NEIGHBORHOOD CLASSIFICATION 1087
B. Parameter Selection
In all the Pittsburgh experiments, we used ten prototypes per
class per particle, and 20 particles as population. We used the
parameters suggested in [21]. Neighborhood was “lbest” with
three neighbors. With these parameters, particle dimension be-
rate is too high and that would worsen the computational cost
comes quite high for some of the problems; in this approach, di-
of the algorithm.
mension is equal to the number of attributes times the number of
For each particle, we calculate a probability of “reproduc-
classes times ten (from 120 to 540 depending on the problem).
tion” (Prep ) using (12). We decided to give a higher reproduc-
In Michigan experiments, we used ten particles per class for
tion rate to particles that have a high local fitness value but
the initial swarm. Particle dimension is equal to the number of
still classify some patterns of different classes. Therefore, we
attributes (from four to nine depending on the problem), while
introduced the best local fitness of the particle in the equation,
the swarm initial population ranges from 20 to 60 particles.
scaled to the interval [0, 1].
The number of iterations was set to 300 both for the
We also introduced a parameter (pr ) in order to tune the
Pittsburgh and Michigan experiments, after checking that num-
probability of reproduction. Finally, we make Prep maximum
ber was roughly equal to double the average iteration in which
at the start of the swarm iteration (to improve exploration), and
the best result was achieved.
we decrease it lineally until its minimum when the maximum
In order to compare computational costs, note that, for the
iteration is reached.
given parameters, each iteration in the Pittsburgh approach
New particles are placed in the best position of the “parent”
requires 20 times (the Pittsburgh population size) the number of
particle, and their velocities are randomized
distance evaluations than an iteration in the Michigan approach.
Best Local Fitness − Minfit The values of the swarm parameters for MPSO and AMPSO
Fnorm = (10) were selected after some preliminary experimentation. This
Maxfit − Minfit
showed that is was better to use a small value for the inertia
Current Iteration coefficient (w = 0.1). In all cases, velocity was clamped to the
Itnorm = 1.0 − (11)
Maximum Iterations interval [−1.0, +1.0].
Prep = Fnorm × Itnorm × pr (12) Table IV summarizes the values for the rest of the
parameters.
where
Minfit minimum value for the local fitness function; C. Experimental Results
Maxfit maximum value for the local fitness function.
In this section, we describe the results of the experiments
and perform comparisons between the Pittsburgh PSO and
IV. E XPERIMENTATION both versions of the MPSO: MPSO, with fixed population, and
AMPSO, with adaptive population.
A. Problem’s Description
We always use two tailed t-tests with α = 0.05 to determine
We perform experimentation on the problems summarized the significance of the comparisons. When we present the
in Table III. They are well-known real problems taken from the results with significance tests, all the algorithms were compared
University of California, Irvine, collection, used for comparison with the algorithm placed in the first column.
with other classification algorithms. All the problems have real- In all tables in this section, we use the following notation:
valued attributes, so no transformation was done on data besides a “(+)” tag next to the result of an algorithm means that the
scaling to the [0, 1] interval. average result was significantly better than the result in the first
We have selected problems with different number of classes column; “(=)” indicates that the difference was not significant;
and attributes. We also include both balanced and unbalanced and “(−)” means that the result was significantly worse when
problems in terms of class frequency. These problems include compared to the algorithm in the first column. We also use
some that can be efficiently solved by NN classifiers and boldface style to highlight the best result. When differences are
others in which the performance of NN classifiers may still be not significant, several algorithms may be marked as providing
improved. the best result.
For each problem and algorithm, we performed ten runs In Table V, we compare the average success rate of the
with tenfold cross validation, which gives a total of 100 runs Pittsburgh PSO and MPSO. The results show that MPSO
over each. achieves a better success rate than the Pittsburgh version except
Authorized licensed use limited to: Abdul Rauf Baig. Downloaded on December 10, 2009 at 10:51 from IEEE Xplore. Restrictions apply.
1088 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 5, OCTOBER 2009
TABLE VI least one pattern. The rest are not considered in the solution,
AVERAGE SUCCESS RATE (IN PERCENT), COMPARISON BETWEEN
MPSO AND AMPSO so the average number of prototypes in the table is less than
the maximum possible (ten prototypes per class in current
experiments).
On the other hand, AMPSO allows the swarm population to
adapt to the problem so on average it provides solutions with a
larger number of prototypes.
The increase in number of prototypes in the solution in
AMPSO is larger for the Balance Scale problem (up to 63
prototypes) and the Diabetes problem. In both cases, AMPSO
obtained better results than MPSO.
However, this was not the case in the Wisconsin problem,
TABLE VII
AVERAGE NUMBER OF PROTOTYPES IN THE SOLUTION where an important increase in the number of prototypes did
FOR THE T HREE A LGORITHMS not lead to better results. As the result for this problem is
already better than the result of basic NN, it may happen that
NN classification cannot be improved much beyond that limit
without the application of other techniques.
In Table VIII, we show the average number of prototype
evaluations needed to reach the solution of each experiment for
each of the algorithms. The purpose of this comparison is only
to show that MPSO/AMPSO achieve their result with a lower
computational cost that the equivalent Pittsburgh approach,
when the number of distance evaluations is considered. This
factor is calculated by adding to a counter, each iteration, the
for the Diabetes and Bupa data sets, where the differences are number of prototypes in the whole swarm on that iteration.
not significant. Except for those two problems, performance of When the best solution is recorded for a experiment, we also
the Pittsburgh approach was indeed poor, as shown later when record the value of this counter.
comparing Pittsburgh with other algorithms. For MPSO and AMPSO, the number of evaluations is similar
In Table VI, we compare the average success rate of MPSO in order of magnitude. The AMPSO needed more evaluations
with the success rate of AMPSO, which includes particle due to the dynamic creation of particles. However, both versions
creation. It shows that AMPSO is better than MPSO in the of the MPSO use less evaluations than the Pittsburgh PSO for
Diabetes problem but difference is much more significant in each of the problems, except for the Iris problem. In the Iris
the Balance Scale problem. As we can see in Table VII, the data set, it seems that the Pittsburgh PSO is stuck in a local
increase in performance is directly related to AMPSO using a minimum, as the result in terms of accuracy is poor.
much larger number of prototypes in the solution. It seems that In other problems, values for the Pittsburgh PSO experiments
the original configuration in Pittsburgh PSO and MPSO (ten are significantly greater because each of the particles in the
particles per class) is unable to represent an accurate solution Pittsburgh swarm encodes the same number of prototypes than
to that problem. For the other problems, the original choice on the whole equivalent Michigan swarm. That is, if the Pittsburgh
initialization seems enough for plain MPSO to provide a good swarm has a population of N particles, then on each iteration, it
result, as the average success rate of AMPSO is not significantly performs N times the number of distance evaluations than the
greater than MPSO. Michigan swarm.
In Table VII, we show the number of prototypes used in In Table IX, the results of AMPSO are compared to the
the solution for each problem and algorithm. Even with a results of NN and prototype-based algorithms. For this compar-
fixed population, in this value we only take into account pro- ison, AMPSO is used as the reference algorithm for significance
totypes that are actually used in the solution to classify at tests.
Authorized licensed use limited to: Abdul Rauf Baig. Downloaded on December 10, 2009 at 10:51 from IEEE Xplore. Restrictions apply.
CERVANTES et al.: AMPSO: NEW PARTICLE SWARM METHOD FOR NEAREST NEIGHBORHOOD CLASSIFICATION 1089
TABLE IX
SUCCESS RATE ON VALIDATION DATA, COMPARISON OF AMPSO VERSUS NN AND PROTOTYPE-BASED ALGORITHMS
TABLE X
SUCCESS RATE ON VALIDATION DATA, COMPARISON OF AMPSO VERSUS OTHER CLASSIFICATION ALGORITHMS
For comparison, we have used our own experimentation 3) SMO is an SVM method;
because published studies are not always usable to perform a 4) RBFNN, an implementation of RBFNNs, that have suc-
proper comparison. This can be due to differences in the data cessfully been applied to these problems [18].
sets, normalization procedure and/or validation strategy (leave- The other two algorithms obtain classification rules using an
one-out, N -fold cross validation, etc.). The WEKA [24] tool underlying evolutionary algorithm. Experiments with these two
was used with this purpose as it is widely used, and the included algorithms were performed using the Keel [25] tool.
algorithm implementations are well tested. 1) GAssist-ADI [19] searches for classification rules en-
Regarding the ENPC algorithm [4], it is an Evolutionary coded using adaptive discrete intervals.
Algorithm with specific operators that allow for prototype 2) Fuzzy Rule Learning Algorithm [20], that extracts fuzzy
reproduction, competition, and extinction. For experimentation, rules also using a GA.
we have used the original implementation from the authors. Results suggest that, for these problems, AMPSO outper-
Results show that AMPSO performs at least as well as the forms J48 and PART algorithms, and also both of the GA-based
basic NN classifiers (both 1-NN and 3-NN) on these problems. algorithms (GAssist and Fuzzy Rule Extraction). However,
AMPSO is significantly better than these algorithms in five out Naive Bayes, SMO, and RBFNN are much harder to improve.
of seven cases. The algorithm is able to improve the result Overall, AMPSO is the best or equal to the best algorithm in
because of the placement of the prototypes and probably also five of the problems.
due to the elimination of the effect of noise in training patterns. In the Glass problem, AMPSO is significantly better than
However, this latter point should be tested explicitly in further any other algorithm. Moreover, ENPC improves all the rest
experiments. significantly. This suggests that, for this problem, evolutionary
Compared to LVQ and ENPC, AMPSO is also shown to be placement of prototypes is a good solution. To our knowledge,
very competitive, as only ENPC gives a better result for one of for this problem, AMPSO has the best success rate in literature.
the problems.
However, Pittsburgh PSO approach is not so competitive,
only being able to produce good results in the Bupa and Balance V. C ONCLUSION
Scale problems. The purpose of this paper is to study different versions of
The results of AMPSO are compared to the results of PSO applied to continuous classification problems. With this
commonly used classification algorithms in Table X. For this goal, we develop three different versions of Nearest Prototype
comparison, AMPSO is used as the reference algorithm for Classifiers. These algorithms are used to locate a small set of
significance tests. prototypes that represent the data sets used for experimentation
The algorithms in Table X are of two classes. The first five without losing classification accuracy.
are nonevolutionary algorithms that are based in quite different The first version is an application of the standard PSO, that
learning paradigms: we call Pittsburgh PSO. In this case, we encode of a full set
1) J48, an implementation of the C.45 tree-based algorithm; of prototypes in each particle. However, this produces a search
2) PART, a rule-based classifier; space of high dimension that prevents the algorithm achieving
Authorized licensed use limited to: Abdul Rauf Baig. Downloaded on December 10, 2009 at 10:51 from IEEE Xplore. Restrictions apply.
1090 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 5, OCTOBER 2009
good results. As a first alternative, we propose MPSO, in which 2) This approach provides the possibility to adjust the com-
each particle represents a single prototype, and the solution is plexity of the solution in an adaptive manner. When used
a subset of the particles in the swarm, thus reducing the search (in AMPSO), the resulting algorithm may compete with
space in an effort to obtain better performance. most of the mainly used classification algorithms.
In both algorithms, a maximum number of prototypes has to 3) It provides an easy way of implementing competitive and
be specified for each class in the data set; this is a parameter for cooperative subsets of the population, as opposed to the
the algorithm and its value becomes a limit in the complexity of Pittsburgh approach in which all the particles interact
the solution that may be encoded. To reduce this limitation, we equally with all their neighbors.
propose a version of MPSO, called AMPSO, which adaptively
changes both the number of particles in the swarm and the class As a result, we think that a Michigan approach for PSO
distribution of these particles. In this algorithm, only the initial is worth generalization and further investigation in other
population has to be specified, and the total number of particles applications.
of each class may increase during the run of the algorithm.
The MPSO algorithms (both MPSO and AMPSO) introduce R EFERENCES
a local fitness function to guide the particles’ movement and [1] J. Kennedy, R. Eberhart, and Y. Shi, Swarm Intelligence. San Francisco,
dynamic neighborhoods that are calculated on each iteration. CA: Morgan Kaufmann, 2001.
These mechanisms ensure particles do not converge to a single [2] H. Brighton and C. Mellish, “Advances in instance selection for instance-
based learning algorithms,” Data Mining Knowl. Discovery, vol. 6, no. 2,
point in the search space. Particles are grouped in neighbor- pp. 153–172, Apr. 2002.
hoods that depend on their class; each particle competes and [3] D. R. Wilson and T. R. Martinez, “Reduction techniques for
cooperates with the closest neighbors to perform classification instance-based learning algorithms,” Mach. Learn., vol. 38, no. 3,
pp. 257–286, Mar. 2000. [Online]. Available: citeseer.ist.psu.edu/article/
of the patterns in its proximity. wilson00reduction.html
We have tested the algorithm in seven well-known bench- [4] F. Fernández and P. Isasi, “Evolutionary design of nearest prototype
mark problems that use continuous attributes. We have found classifiers,” J. Heuristics, vol. 10, no. 4, pp. 431–454, Jul. 2004.
[5] T. Sousa, A. Silva, and A. Neves, “Particle swarm based data mining
that the results of AMPSO and MPSO are always equal to algorithms for classification tasks,” Parallel Comput., vol. 30, no. 5/6,
or better than the standard NN classifiers. In most cases, the pp. 767–783, May/Jun. 2004.
number of prototypes that compose the solutions found by [6] Z. Wang, X. Sun, and D. Zhang, Classification Rule Mining Based on
Particle Swarm Optimization, vol. 4062/2006. Berlin, Germany:
both MPSO algorithms is quite reduced. This proves that an Springer-Verlag, 2006.
MPSO can be used to produce a small but representative set [7] Z. Wang, X. Sun, and D. Zhang, A PSO-Based Classification Rule Mining
of prototypes for a data set. The adaptive version (AMPSO) Algorithm, vol. 4682/2007. Berlin, Germany: Springer-Verlag, 2007.
[8] A. A. A. Esmin, “Generating fuzzy rules from examples using the particle
always produces equal or better solutions than MPSO; it is able swarm optimization algorithm,” in Proc. HIS, 2007, pp. 340–343.
to increase accuracy by increasing the number of prototypes in [9] N. P. Holden and A. A. Freitas, “A hybrid PSO/ACO algorithm for
the solution in some of the problems. As for the Pittsburgh PSO, classification,” in Proc. GECCO Conf. Companion Genetic Evol.
Comput., 2007, pp. 2745–2750.
its results never improve the results of the Michigan versions [10] A. Cervantes, P. Isasi, and I. Galván, “Binary particle swarm optimiza-
and have higher computational costs. tion in classification,” Neural Netw. World, vol. 15, no. 3, pp. 229–241,
When the results are compared to other classifiers, AMPSO 2005.
[11] I. D. Falco, A. D. Cioppa, and E. Tarantino, Evaluation of Particle Swarm
can produce competitive results in all the problems, specially Optimization Effectiveness in Classification. Berlin, Germany: Springer-
when used in data sets where the 1-NN classifier does not Verlag, 2006.
perform very well. Finally, AMPSO outperforms significantly [12] J. Holland, “Adaptation,” in Progress in Theoretical Biology. New York:
Academic, 1976, pp. 263–293.
all the algorithms on the Glass Identification data set, where [13] S. W. Wilson, “Classifier fitness based on accuracy,” Evol. Comput., vol. 3,
it achieves more that 10% improvement on average, being the no. 2, pp. 149–175, 1995.
best result found in literature up to this moment on this data set. [14] A. Cervantes, P. Isasi, and I. Galván, “A comparison between the
Pittsburgh and Michigan approaches for the binary PSO algorithm,” in
It is clear that further work could improve the algorithm Proc. IEEE CEC, 2005, pp. 290–297.
performance if it makes AMPSO able to adaptively tune im- [15] A. Cervantes, I. Galván, and P. Isasi, “Building nearest prototype
portant parameters (such as the reproduction and deletion rate) classifiers using a Michigan approach PSO,” in Proc. IEEE SIS, 2007,
pp. 135–140.
to the problem. Moreover, any technique that may improve [16] T. Kohonen, Self-Organizing Maps. Berlin, Germany: Springer-Verlag,
NN classifiers’ performance could be applied to AMPSO, such 1995.
as using a different distance measure or using more than one [17] M. J. D. Powell, “Radial basis functions for multivariable interpolation:
A review,” in Algorithms for Approximation of Functions and Data.
neighbor for classification. Oxford, U.K.: Oxford Univ. Press, 1987, pp. 143–167.
In summary, our proposed MPSO is able to obtain Nearest [18] D. Yeung, W. Ng, D. Wang, E. Tsang, and X.-Z. Wang, “Localized gener-
Prototype classifiers which provide better or equal results than alization error model and its application to architecture selection for radial
basis function neural network,” IEEE Trans. Neural Netw., vol. 18, no. 5,
the most used classification algorithms in problems of different pp. 1294–1305, Sep. 2007.
characteristics. Compared to the standard Pittsburgh approach, [19] J. Bacardit and J. M. Garrell i Guiu, “Evolving multiple discretizations
the Michigan versions provide the following advantages. with adaptive intervals for a Pittsburgh rule-based learning classifier
system,” in Proc. GECCO, 2003, pp. 1818–1831.
[20] H. Ishibuchi, T. Nakashima, and T. Murata, “Performance evaluation
1) They attain better results than the Pittsburgh PSO ap- of fuzzy classifier systems for multidimensional pattern classification
proach due to the reduction in the dimensionality of problems,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 29, no. 5,
pp. 601–618, Oct. 1999.
search space. This means they can also reduce the com- [21] J. Bratton and D. Kennedy, “Defining a standard for particle swarm
putational cost. optimization,” in Proc. IEEE SIS, Apr. 1–5, 2007, pp. 120–127.
Authorized licensed use limited to: Abdul Rauf Baig. Downloaded on December 10, 2009 at 10:51 from IEEE Xplore. Restrictions apply.
CERVANTES et al.: AMPSO: NEW PARTICLE SWARM METHOD FOR NEAREST NEIGHBORHOOD CLASSIFICATION 1091
[22] T. M. Blackwell and P. J. Bentley, “Don’t push me! collision-avoiding Inés María Galván received the Ph.D. degree in
swarms,” in Proc. IEEE CEC, 2002, pp. 1691–1696. computer science from Universidad Politécnica de
[23] T. Blackwell and P. J. Bentley, “Dynamic search with charged swarms,” Madrid, Madrid, Spain, in 1998.
in Proc. GECCO, 2002, pp. 19–26. From 1992 to 1995, she received a Doctorate-
[24] I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Fellowship, as a Research Scientist, in the European
Tools and Techniques. San Francisco, CA: Morgan Kaufmann, 2005. Commission, Joint Research Center Ispra, Italy.
[25] J. Alcala-Fdez, S. Garcia, F. Berlanga, A. Fernandez, L. Sanchez, Since 1995, she has been with the Department of
M. del Jesus, and F. Herrera, “Keel: A data mining software tool Computer Science, University Carlos III of Madrid,
integrating genetic fuzzy systems,” in Proc. 3rd Int. Workshop GEFS, Madrid, where she has been an Associate Professor
Mar. 2008, pp. 83–88. since 2000. Her current research focuses in artificial
neural networks and evolutionary computation tech-
niques as genetic algorithms, evolutionary strategies, and particle swarm.
Alejandro Cervantes was born in Barcelona, Spain, Pedro Isasi received the Ph.D. degree in computer
in 1968. He received the Telecommunications Engi- science from Universidad Politécnica de Madrid
neer degree from Universidad Politécnica de Madrid, (UPM), Madrid, Spain, in 1990.
Madrid, Spain, in 1993. He is currently working He is currently a Professor in computer science
toward the Ph.D. degree in the Department of Com- with University Carlos III of Madrid, Madrid, Spain,
puter Science, University Carlos III of Madrid. where he is also the Head of the Department of
He is currently an Assistant Teacher with Computer Science and Founder and Director of
the Department of Computer Science, University the Neural Network and Evolutionary Computation
Carlos III of Madrid. His current interests focus Laboratory. His principal research are in the field of
in swarm intelligence algorithms such as particle machine learning, metaheuristics, and evolutionary
swarm, ant colony, and cultural algorithms, both for optimization methods, mainly applied to the field of
classification and multiobjective optimization problems. finance and economics, forecasting, and classification.
Authorized licensed use limited to: Abdul Rauf Baig. Downloaded on December 10, 2009 at 10:51 from IEEE Xplore. Restrictions apply.