Dynamic Classifier Selection
Dynamic Classifier Selection
Information Fusion
journal homepage: www.elsevier.com/locate/inffus
a r t i c l e i n f o a b s t r a c t
Article history: Multiple Classifier Systems (MCS) have been widely studied as an alternative for increasing accuracy in
Received 5 July 2017 pattern recognition. One of the most promising MCS approaches is Dynamic Selection (DS), in which the
Revised 1 September 2017
base classifiers are selected on the fly, according to each new sample to be classified. This paper provides
Accepted 10 September 2017
a review of the DS techniques proposed in the literature from a theoretical and empirical point of view.
Available online 11 September 2017
We propose an updated taxonomy based on the main characteristics found in a dynamic selection system:
Keywords: (1) The methodology used to define a local region for the estimation of the local competence of the base
Multiple classifier systems classifiers; (2) The source of information used to estimate the level of competence of the base classifiers,
Ensemble of classifiers such as local accuracy, oracle, ranking and probabilistic models, and (3) The selection approach, which
Dynamic classifier selection determines whether a single or an ensemble of classifiers is selected. We categorize the main dynamic
Dynamic ensemble selection selection techniques in the DS literature based on the proposed taxonomy. We also conduct an extensive
Classifier competence
experimental analysis, considering a total of 18 state-of-the-art dynamic selection techniques, as well as
Survey
static ensemble combination and single classification models. To date, this is the first analysis comparing
all the key DS techniques under the same experimental protocol. Furthermore, we also present several
perspectives and open research questions that can be used as a guide for future works in this domain.
© 2017 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.inffus.2017.09.010
1566-2535/© 2017 Elsevier B.V. All rights reserved.
196 R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216
Fig. 1. The three possible phases of an MCS system. In the first stage, a pool of classifiers C = {c1 , . . . , cM } (M is the number of classifiers) is generated. In the second phase,
an Ensemble of Classifiers (EoC), C ⊆ C is selected. In the last phase, the decisions of the selected base classifiers are aggregated to give the final decision.
Fig. 2. Taxonomy of an MCS system considering the three main phases. The selection stage is highlighted since it is the focus of this review.
4. Different classifier models: This method involves the combina- search algorithms have been considered for static selection, such as
tion of different classification models (decision tree, K-Nearest greedy search [72,73], evolutionary algorithms [71,74,75] and other
Neighbor (K-NN) and SVM, for example). In this case, diversity heuristic approaches [69,76].
is due to the intrinsic properties of each model, which change In contrast, in dynamic selection, a single classifier or an en-
the way the decision boundaries are generated. Systems that semble is selected specifically to classify each unknown example.
use different classifier models are often called heterogeneous Based on a pool of classifiers C, dynamic selection techniques con-
models. sist in finding a single classifier ci , or an ensemble of classifiers
5. Different training sets: In this case, each base classifier is C ⊆ C, having the most competent classifiers for the classification
trained using a different distribution of the training set. The of a specific query, xj . The rationale for dynamic selection tech-
Bagging [50,51], Boosting [52,53] and clustering-based classi- niques is that each base classifier is an expert in distinct regions
fier generation approaches [10,54] are examples of generation of the feature space. The method aims to select the most compe-
methods that are based on this paradigm. tent classifiers in the local region where xj is located.
6. Different feature sets: This methodology is used in applica-
tions where the data can be represented in distinct feature 2.3. Aggregation
spaces. In face recognition, for example, multiple feature ex-
traction methods can be applied to extract distinct sets of fea- The aggregation phase consists in fusing the outputs obtained
tures based on a face image [43]. The same principle applies by the selected classifiers according to a combination rule. The
to handwriting recognition [55–57] and music genre classifica- combination of the base classifiers can be performed based on
tion [58]. Each base expert can be trained based on a differ- the class labels, such as in the Majority Voting scheme, or by
ent feature extraction method. Furthermore, different feature using the scores obtained by the base classifier for each of the
spaces can be generated based on one feature space through classes in the classification problem. In this approach, the scores
the selection of a subset of features, just as in the random sub- obtained by the base classifiers are interpreted as fuzzy class mem-
space method [59,60]. berships [77] or the posterior probability that a sample xj belongs
to a given class [78].
Empirical studies of different generation strategies can be found There are three main strategies for the aggregation phase: non-
in [61,62]. It should be mentioned that more than one strategy can trainable, trainable and dynamic weighting.
be used together. For example, in the Random Forest [63,64] and
Rotation Forest [65] techniques, each decision tree can be trained 2.3.1. Non-trainable
using different feature sets, different divisions of the dataset, as Several non-trainable rules for combining classifiers have been
well as different configurations of hyper-parameters. proposed [47,79]. Examples of such aggregation methods are the
Sum, Product, Maximum, Minimum, Median and Majority voting
2.2. Selection schemes [79], Borda count [80], Behavior Knowledge space [81],
Decision Templates [82] and Dempster-Shafer combination [83,84].
For the selection stage, it can be conducted either in a static or The effectiveness of different aggregation rules have been analyzed
dynamic fashion. Fig. 3 presents the differences in the selection ap- in several works [6,47,80,85–87]. As reported in [47], the problem
proaches. In static selection methods, the EoC, C , is selected dur- with non-trainable combination rules is that they require certain
ing the training phase, according to a selection criterion estimated assumptions about the base classifiers in order to obtain a good
in the validation dataset. The same ensemble C is used to pre- performance. For instance, the Majority Voting and Product rule
dict the label of all test samples in the generalization phase. The are effective if the base classifiers are independent, while the Sum
most common selection criteria used for selecting static ensem- rule produces good results when the base classifiers have indepen-
bles are diversity [66–70] and classification accuracy [46,71]. Many dent noise behavior.
198 R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216
Fig. 3. Differences between static selection, dynamic classifier selection (DCS) and dynamic ensemble selection (DES). In static selection, the EoC is selected based on the
training or validation data. In the dynamic selection approaches, the selection is based on each test data xj .
Fig. 4. Taxonomy of dynamic selection systems. The taxonomy of the selection criteria is based on the previous taxonomy proposed by Britto et al. [24].
ble to achieve results higher than the Oracle by working on the sensitive to the distribution of this region [104,107,108]. Indeed,
supports given by the base classifier [21,100], from a dynamic se- many recent papers have pointed out that it is possible to improve
lection point of view, the Oracle is regarded in the literature as a the performance of DS methods just by working on better defining
possible upper limit for the performance of MCS, and as such, it these regions [102–104,109–111].
is widely used to compare the performances of different dynamic Usually, the local regions are defined using the K-NN tech-
selection schemes [101]. The Oracle can thus measure how close a nique [26,31], via clustering methods (e.g., K-Means) [29,30], us-
DCS technique is to the upper limit performance, for a given pool ing the decisions of the base classifiers [106,112,113] or a com-
of classifiers, and indicates whether there is still room for improve- petence map that is defined through the use of a potential func-
ment in terms of classification accuracy. tion [25,114]. In all cases, a set of labeled samples, which can be
It has however been shown that there is a significant perfor- either the training or validation set, is required. This set is called
mance gap between DS schemes and the Oracle [26,27,101]. Didaci the dynamic selection dataset (DSEL) [27].
et al. [101] stated that the Oracle is too optimistic to be consid-
ered as an upper bound for dynamic selection techniques. In fact, 3.1.1. Clustering
the Oracle can correctly classify instances that should not be cor- In techniques that use clustering to define the region of com-
rectly classified based on the Bayesian decision theory [21]. petence [29,30,115,116], the first step is to define the clusters in
DSEL. Next, the competence of each base classifier is estimated for
3. Dynamic selection
all clusters. During the generalization stage, given a new test sam-
ple, xj , the distance between the test sample and the centroid of
In dynamic selection, the classification of a new query sample
each cluster is calculated. The competence of the base classifiers
usually involves three steps:
are then measured based on the samples belonging to the nearest
1. Definition of the region of competence; that is, how to define cluster.
the local region surrounding the query, xj , in which the compe- The advantage of using the clustering technique is that all the
tence level of the base classifiers is estimated. rankings and classifier selections are estimated during the train-
2. Determination of the selection criteria used to estimate the ing phase. For each cluster, the EoC is defined a priori. Hence, DS
competence level of the base classifiers, e.g., Accuracy, Proba- techniques based on clustering are much faster during the gener-
bilistic, and Ranking. alization phase. In addition, only the distance between the query
3. Determination of the selection mechanism that chooses a sin- sample and the centroids of each cluster needs to be estimated,
gle classifier (DCS) or an ensemble of classifiers (DES) based on rather than all instances in DSEL.
their estimated competence level.
Fig. 4 presents the taxonomy of DS, considering these three as- 3.1.2. K-Nearest Neighbors
pects. DS methods can be improved by working on each of these In the case of the K-NN technique, the K-Nearest neighbors of
points. For instance, the approaches proposed in [102–104] aim the query sample, xj , are estimated, using the dynamic selection
to improve DS techniques by obtaining better estimates of the dataset (DSEL). The set with the K-Nearest Neighbors is called the
regions of competence, while several works are based on new region of competence and is denoted by θ j = {x1 , . . . , xK }. Then,
criteria for estimating the competence level of the base classi- the competence of the base classifiers is estimated taking into ac-
fiers [26,27,30,31,34,105,106]. These three aspects are detailed in count only the instances belonging to this region of competence.
the following sections. The advantage of using K-NN over clustering is that K-NN al-
lows a more precise estimation of the local region, which leads to
3.1. Region of competence definition many different configurations of EoC according to the classification
of the new instances [30,116]. However, there is a higher computa-
The definition of a local region is of fundamental importance to tional cost involved when using K-NN rather than clustering, since
DS methods, since the performance of all DS techniques is very the distance between the query and the whole DSEL needs to be
200 R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216
estimated prior to estimating the classifiers’ competence. This is a region defined in the decision space are the Multiple Classifier Be-
problem especially when dealing with large sized datasets [107]. havior (MCB) [112], K-Nearest Output Profiles (KNOP) [106,121] and
Since the definition of the region of competence plays a very META-DES [27,99].
important role in the accuracy of DS techniques, some works have
evaluated different versions of the K-NN algorithm for a better es-
timation of such regions. In [107], the authors considered an adap- 3.2. Selection criteria
tive K-NN proposed in [117,118], which shifts the region of com-
petence from the class border to the class centers. Samples that The criterion used to measure the competence level of the base
are more likely to be noise were less likely to be selected to com- classifiers for the classification of xj is a key component of any dy-
pose the region of competence. In [39], the authors used the K- namic selection technique. Based on [24], the criteria can be orga-
Nearest Neighbors Equality (K-NNE) [119] to estimate the regions nized into two groups (Fig. 4): individual-based and group-based
of competence. Didaci and Giacinto [103] evaluated the impact of measures. The former presents the measures where the individual
an adaptive neighborhood for dynamic classifier selection tech- performance of the base classifier is used to estimate its level of
niques. They evaluated the choice of the better suited distance competence. The competence of each base classifier ci is measured
metric to compute the neighborhood as well as a suitable choice independently of the performance of the other base classifiers in
of neighborhood size. the pool. This category can be divided into several other subgroups
according to the type of information that is used to measure the
3.1.3. Potential function model competence of the base classifiers, namely, Ranking [31,33], Ac-
These methods are inspired by the work of Rastrigin and Eren- curacy [31,102], Probabilistic [25,66,122], Behavior [106,112], Ora-
stein [114], which is one of the first works to provide a method- cle [26], Data complexity [123] and Meta-learning [27,124,125]. It
ology for dynamic selection. They differ from the majority of the must be mentioned that the system based on meta-learning, how-
other techniques in the DS literature in that they use the whole ever, presents a different perspective of how the competence of a
dynamic selection dataset for the computation of competence, base classifier can be “learned” based on different sources of infor-
rather than just only the neighborhood of the test sample. How- mation.
ever, the influence of each data point in xk ∈ DSEL is weighted The group-based measures are composed of metrics that take
by its Euclidean distance to the query xj using a potential func- into account the interaction between the classifiers in the pool.
tion model. Usually, a Gaussian potential function is considered This category can be further divided into three subgroups [24]:
(Eq. (1)). Hence, the points that are closer to the query have a Diversity [30,71], Data Handling [126] and Ambiguity [127]. These
higher influence on the estimation of the classifiers’ competence. measures are not directly related to the notion of competence of a
base classifier, but rather, to the notion of relevance, i.e., whether
K (xk , x j ) = exp(−d (xk , x j )2 ) (1) the base classifier works well in conjunction with other classifiers
in the ensemble. In these techniques, they are based on the per-
Several DS techniques have been proposed using the potential formance of the base classifier in relation to the performance of
function model: Dynamic Ensemble Selection based on Kullback– a pre-selected ensemble of classifiers. For instance, in [30], first,
Leibler divergence (DES-KL) [34], the technique based on the ran- an ensemble with the most accurate classifiers is selected (thus,
domized reference classifier (RRC) [25], the DCS methods based on the local accuracy is the criterion used to select the most compe-
logarithmic and exponential functions [120]. tent classifiers individually). Next, the system checks the base clas-
Using this class of methods to define the regions of competence sifiers that are more diverse, in relation to the pre-selected base
has the advantage of removing the need to set the neighborhood classifiers, in order to add more diversity to the EoC.
size a priori as the potential function K(xk , xj ) is used to reduce
the influence of each data point based on its Euclidean distance to
the query. However, its drawback is the increased computational 3.3. Selection approach
cost involved in computing the competence of the base classifier
since the whole DSEL, and not just the neighborhood of the query Regarding the selection approach, dynamic selection techniques
sample, is used for the competence estimation. can select either a single classifier, dynamic classifier selection
(DCS) or select an ensemble of classifiers (dynamic ensemble se-
3.1.4. Decision space lection (DES)). Early works in dynamic selection started with the
The DS techniques in this category are based on the behav- selection of a single classifier rather than an ensemble of classi-
ior of the pool of classifiers using the classifiers’ predictions as fiers (EoC). In such techniques, only the classifier that attained the
information. They are inspired by the Behavior Knowledge Space highest competence level is used for the classification of the given
(BKS) [81], often called the “decision space” [106,112], since it is test sample. Examples of DCS methods are the A Priori and A Pos-
based on the decisions made by the base classifiers. teriori methods [101], as well as the Multiple-Classifier Behaviour
An important aspect of this class of techniques is the transfor- (MCB) [112].
mation of the test and training sample into output profiles. This However, given the fact that selecting only one classifier can be
transformation can be conducted by using the hard decisions of highly error-prone, some researchers decided to select a subset of
the base classifiers (e.g., the class labels predicted), such as in the the pool of classifiers rather than just a single base classifier. All
BKS method, or by using the estimated posterior probabilities of base classifiers that obtained a certain competence level are used
the base classifiers, as suggested in [106,113,121]
. The output pro- to compose the EoC, and their outputs are aggregated to predict
file of an instance xj is denoted by x˜ j = x˜ j,1 , x˜ j,2 , . . . , x˜ j,M , where the label of xj . Examples of DES techniques are the K-Nearest Ora-
each x˜ j,i is the decision yielded by the base classifier ci for the cles (KNORA) [26], the K-Nearest Output Profiles (KNOP) [106], the
sample xj . method based on the Randomized Reference Classifier (RRC) DES-
Then, the competence region is calculated by the similarity be- RRC [25] and the META-DES framework [27,99]. Another reason for
tween the output profile of the query, x˜ j , and the output profiles selecting an EoC rather than a single classifier model is that, fre-
of the samples in DSEL. The set with the most similar output pro- quently, several base classifiers present the same competence level
files, denoted by φ j , is used to estimate the competence level of locally. In such cases, the question then is why all of them are not
the base classifiers. Examples of techniques that use a competence selected, rather than one being randomly chosen.
R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216 201
Table 1
Categorization of DS methods. They are based on their year of publication. All methods presented in this table are later considered in our comparative study (Section 6).
Technique Region of competence definition Selection criteria Selection approach Reference Year
Classifier Rank (DCS-Rank) K-NN Ranking DCS Sabourin et al. [33] 1993
Overall Local Accuracy (OLA) K-NN Accuracy DCS Woods et al.[31] 1997
Local class accuracy (LCA) K-NN Accuracy DCS Woods et al.[31] 1997
A Priori K-NN Probabilistic DCS Giacinto[129] 1999
A Posteriori K-NN Probabilistic DCS Giacinto[129] 1999
Multiple Classifier Behavior (MCB) K-NN Behavior DCS Giacinto et al.[112] 2001
Modified Local Accuracy (MLA) K-NN Accuracy DCS P.C. Smits[32] 2002
DES-Clustering Clustering Accuracy & Diversity DES Soares et al.[30,116] 2006
DES-KNN K-NN Accuracy & Diversity DES Soares et al.[30,116] 2006
K-Nearest Oracles Eliminate (KNORA-E) K-NN Oracle DES Ko et al.[26] 2008
K-Nearest Oracles Union (KNORA-U) K-NN Oracle DES Ko et al.[26] 2008
Randomized Reference Classifier (RRC) Potential function Probabilistic DES Woloszynski et al.[25] 2011
Kullback-Leibler (DES-KL) Potential function Probabilistic DES Woloszynski et al.[34] 2012
DES Performance (DES-P) Potential function Probabilistic DES Woloszynski et al.[34] 2012
K-Nearest Output Profiles (KNOP) K-NN Behavior DES Cavalin et al.[106] 2013
META-DES K-NN Meta-Learning DES Cruz et al.[27] 2015
META-DES.Oracle K-NN Meta-Learning DES Cruz et al.[130] 2016
Dynamic Selection On Complexity (DSOC) K-NN Accuracy & Complexity DCS Brun et al.[123] 2016
In this section, we present a review of the most relevant dy- The LCA technique [31] is similar to the OLA, with the only dif-
namic selection algorithms. The DS techniques were chosen taking ference being that in the former, the local accuracy is estimated
into account their importance in the literature by the introduction in respect of output classes ωl (ωl is the class assigned for xj by
of new concepts in the area (i.e., methods that introduced differ- ci ) for the whole region of competence (Eq. (3)). The classifier pre-
ent ways of defining the competence region or selection criteria), senting the highest competence level, δ i, j , is selected to predict the
their number of citations, as well as the availability of source code. label of xj .
Minor variations of an existing technique, such as different ver-
P ( ωl | x k , ci )
sions of the KNORA-E technique proposed in [102] and [98], were δi, j = xKk ∈ωl (3)
not considered. Furthermore, we gave more emphasis to the tech- k=1 P ( ωl | x k , ci )
niques proposed in the last four years, since they were published
after the last reviews in MCS [21,22,24,128]. 4.4. A Priori
Table 1 categorizes the key dynamic selection techniques de-
scribed in this review according to our proposed taxonomy. These The A Priori method [129] considers the probability of correct
techniques are used in our experimental evaluation conducted in classification of the base classifier ci , in θ j taking into account
Section 6. They are detailed in the next sections 2 . Moreover, in the supports obtained by the base classifier ci . Hence, the vector
Section 4.19, we present the use of DS techniques in different pat- containing the posterior probabilities for each class is considered
tern recognition contexts (e.g., One-Class Classification and One- instead of only the label assigned to each xk ∈ θ j . Moreover, this
Versus-One decomposition). method also weights the influence of each sample, xk , in the region
of competence according to its Euclidean distance to the query xj .
The closest samples have a higher influence on the computation of
4.1. Modified Classifier Ranking (DCS-Rank) the competence level δ i, j . Eq. (4) demonstrates the calculation of
the competence level δ i, j using the A Priori method:
In the Modified Classifier Ranking method [31,33], the ranking K
P (ωl | xk ∈ ωl , ci )Wk
of a single base classifier ci is simply estimated by the number δi, j = k=1
K (4)
of consecutive correctly classified samples in the region of compe- k=1 Wk
tence θ j . The classifier that correctly classifies the highest number The classifier with the highest value of δ i, j is selected. However,
of consecutive samples is considered to have the highest “rank”, this selected classifier is only used to predict the label of xj if its
and is selected as the most competent classifier for the classifica- competence level is significantly better than that of the other base
tion of xj . classifiers in the pool (i.e., when the difference in competence level
is higher than a predefined threshold). Otherwise, all classifiers in
the pool are combined using the majority voting rule.
4.2. Overall Local Accuracy (OLA)
4.5. A Posteriori
In this method [31], the level of competence, δ i, j , of a base clas-
sifier ci is simply computed as its classification accuracy in the re- The A Posteriori method [129] works similarly to the A Priori.
gion of competence θ j (Eq. (2)). The classifier presenting the high- The only difference is that it takes into account the class predicted
est competence level is selected to predict the label of xj . by the base classifier ci , for the test sample xj during the compe-
tence estimation (Eq. (5)).
1
K
δi, j = P ( ωl | x k ∈ ωl , ci ) (2) P (ωl | xk , ci )Wk
K δi, j = xKk ∈ωl (5)
k=1
k=1 P (ωl | xk , ci )Wk
The classifier with the highest value of δ i, j is selected. As in the
2
Code for all 18 DS the techniques is available upon request. A Priori method, the selected classifier is only used to predict the
202 R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216
label of xj if its competence level is significantly better than that 4.9. DES-KNN
of the other base classifiers in the pool (i.e., when the difference
in competence level is higher than a predefined threshold). Oth- The first step in this technique is to compute the region of com-
erwise, all classifiers in the pool are combined using the majority petence θ j . Then, the base classifiers are ranked in decreasing or-
voting rule. der of accuracy and in increasing order of diversity based on the
samples belonging to θ j . The Double Fault measure [67] was used,
4.6. Multiple Classifier Behavior (MCB) since it presented the highest correlation with ensemble accuracy
in the study conducted by Shipp and Kuncheva [131]. Then, the N
The MCB technique is based on the behavior knowledge space most accurate classifiers and the J most diverse classifiers are se-
(BKS) [81] and the classifier local accuracy. Given a new test sam- lected to compose the EoC, C . The values of J and N, (J ≤ N) must
ple xj , its region of competence, θ j , is estimated. Next, the output be defined prior to applying this method.
profiles of the test sample as well as those of the region of com-
petence are computed using the BKS algorithm.
4.10. KNORA-Eliminate
The similarity between the output profile of the test sample x˜ j
and those from its region of competence, x˜ k ∈ θ j , are calculated
The KNORA-Eliminate technique [26] explores the concept of
(Eq. (6)). Samples with similarities lower than a predefined thresh-
Oracle, which is the upper limit of a DCS technique. Given the re-
old are removed from the region of competence θ j . Hence, the size
gion of competence θ j , only the classifiers that correctly recognize
of the region of competence is variable, since it also depends on
all samples belonging to the region of competence are selected. In
the degree of similarity between the query sample and those in its
other words, all classifiers that achieved a 100% accuracy in this
region of competence. After all the similar samples are selected,
region (i.e., that are local Oracles) are selected to compose the en-
the competence of the base classifier, δ i, j , is simply estimated by
semble C . Then, the decisions of the selected base classifiers are
its classification accuracy in the resulting region of competence.
aggregated using the majority voting rule. If no base classifier is
1 selected, the size of the region of competence is reduced, and the
M
S(x˜ j , x˜ k ) = T ( x j , xk ) (6) search for the competent classifiers is restarted.
M
i=1
1 if ci ( x j ) = ci ( x k ), 4.11. KNORA-Union
T ( x j , xk ) = (7)
0 if ci ( x j ) = ci ( x k ).
The KNORA-Union technique [26] selects all classifiers that are
Similar to the A Priori and A Posteriori techniques (Sections 4.4 able to correctly recognize at least one sample in the region of
and 4.5), the decision is made as follows: If the selected classi- competence. This method also considers that a base classifier can
fier is significantly better than the others in the pool (difference in participate more than once in the voting scheme when it correctly
competence level is higher than a predefined threshold), it is used classifies more than one instance in the region of competence. The
for the classification of xj . Otherwise, all classifiers in the pool are number of votes of a given base classifier ci is equal to the num-
combined using the majority voting rule. ber of samples in the region of competence, θ j , for which it pre-
dicted the correct label. For instance, if a given base classifier ci
4.7. Modified Local Accuracy (MLA) predicts the correct label for three samples belonging to θ j , it gains
three votes for the majority voting scheme. The votes collected by
Proposed by Smits [32], this technique aims to solve the prob- all base classifiers are aggregated to obtain the ensemble decision.
lem of defining the size of the region of competence (i.e., the num-
ber of instances selected to compose the region of competence).
4.12. Randomized reference classifier (DES-RRC)
When the value of K is too high, instances that are not similar to
xj may be included in the region of competence, while a low value
This method is based on the randomized reference classifier,
of K may lead to insufficient information. To tackle this issue, the
in order to decide whether or not the base classifier ci performs
MLA algorithm weights each instance in θ j by its distance to xj
significantly better than the random classifier. The level of com-
(Eq. (8)):
petence of ci is computed based on two parts: a source of com-
K petence Csrc and a Gaussian potential function K(xk , xj ) (Eq. (1)),
δi, j = P (ωl | xk ∈ ωl , ci )Wk (8) which is used to reduce the influence of each data point in DSEL
k=1 based on its Euclidean distance to xj . Thus, the competence level
The classifier presenting the highest competence level, δ i, j , is of a base classifier, ci , for the classification of the query, xj , is esti-
selected to predict the label of xj . mated using Eq. (9).
δi, j = Csrc K (xk , x j ) (9)
4.8. DES-clustering (DES-KMEANS)
xk ∈DSEL
In this method [30], the K-Means algorithm is applied to DSEL The source of competence Csrc is estimated based on the con-
in order to sub-divide this set into several clusters. For each clus- cept of randomized reference classifier (RRC) proposed in [105]3 .
ter produced, the classifiers are ranked in decreasing order of ac- The base classifiers that presented a competence level higher than
curacy and in increasing order of diversity. The Double Fault mea- the random classifier are selected to compose the ensemble C . The
sure [67] is used to measure the diversity of the base classifiers. base classifiers with a level of competence δ i, j higher than the
The N most accurate and the J most diverse classifiers are associ- competence of a random classifier 1L are selected to compose the
ated to each cluster.
ensemble C
Given a new sample xj of unknown class, its Euclidean distance
to the centroid of each cluster is calculated. Then, the set of N
most accurate and J most diverse classifiers associated to the near- 3
The Matlab code for this technique is available at: http://www.mathworks.com/
est cluster is used to compose the ensemble of classifiers C . matlabcentral/fileexchange/28391- a- probabilistic- model- of- classifier- competence.
R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216 203
4.13. Dynamic ensemble selection performance (DES-P) • The meta-features are encoded into a meta-features vector vi, j .
• A meta-classifier λ is trained based on the meta-features vi, j
Proposed by Woloszynski et al. [34], this method works as fol- to predict whether or not ci will achieve the correct prediction
lows: First, the local performance of a base classifier ci is calculated for xj , i.e., if it is competent enough to classify xj .
using the region of competence θ j . The competence of the base
classifier is then calculated by the difference between the accuracy In other words, a meta-classifier, λ, is trained, to predict
of the base classifier ci , in the region of competence θ j (denoted whether a base classifier ci is competent enough to classify, a given
test sample xj . After the pool of classifiers is generated, the frame-
by Pˆ(ci | θ j )), and the performance of the random classifier, that is,
work performs a meta-training stage, in which, the meta-features
the classification model that randomly chooses a class with equal
are extracted from each instance belonging to the training and
probabilities. For a classification problem with L classes, the perfor-
the dynamic selection dataset (DSEL). Then, the extracted meta-
mance of the random classifier is (RC = 1L ). Hence, the competence
features are used to train the meta-classifier λ. Thus, the advan-
level δ i, j in this technique is calculated according to Eq. (10).
tage of using meta-learning is that multiple criteria can be encoded
1 as different sets of meta-features in order to estimate the compe-
δi, j = Pˆ(ci | θ j ) − (10)
tence level of the base classifiers. In addition, the selection rule
L
The base classifiers with a positive value of δ i, j , i.e., that obtain is learned by the meta-classifier using the meta-features extracted
a local accuracy higher than the random classifier, are selected to from the training data.
compose the ensemble C . When an unknown sample, xj , is presented to the system, the
meta-features are calculated according to xj , and presented to the
meta-classifier. The competence level δ i, j of the base classifier ci
4.14. Kullback–Leibler divergence (DES-KL)
for the classification of xj is estimated by the meta-classifier.
The impact of the meta-classifier as well as a variation of
The DES-KL method [34] measures the competence of the base
the META-DES framework was evaluated in [97]. Four classifier
classifiers from an information theory perspective. For each in-
models for the meta-classifier were evaluated: MLP, SVM, a Naive
stance, xk , from the whole DSEL, the source of competence Csrc
Bayes (NB) classifier and Random Forest (RF). Experimental results
is calculated as the Kullback–Leibler (KL) divergence between the
demonstrated that the performance of the MLP, SVM and NB were
uniform distribution and the vector of class supports, S(xk ) =
statistically equivalent. However, since an NB obtained the highest
{S1 (xk ), . . . , SL (xk )}, estimated by the base classifier, ci . Then, a
number of wins in the experimental analysis, it was selected as
Gaussian potential function is applied to weight the source of com-
the overall best classification model for the meta-classifier. Three
petence based on the Euclidean distance between xk and the query
versions of the META-DES framework were evaluated: Dynamic se-
sample xj (Eq. (11)).
lection, Dynamic weighting and Hybrid. In the dynamic selection
δi, j = Csrc exp(−d (xk , x j )2 ) (11) approach, only the classifiers that attain a certain level of compe-
xk ∈DSEL tence are used to classify a given query sample. In the dynamic
weighting approach, all classifiers in the pool are used for classi-
Since the KL divergence is always positive, the signal of Csrc is
fication; however, their decisions are weighted based on their es-
set as positive if the base classifier ci predicted the correct label
timated competence levels. Classifiers that attain a higher level of
for the instance xk , and negative otherwise. After computing the
competence for the classification of the given query sample have
KL divergence for all samples in DSEL, the base classifiers, ci , with
a greater impact on the final decision. The hybrid approach con-
a positive value of δ i, j are selected to compose the EoC, C .
ducts both steps: first, the base classifiers that obtained a certain
competence level are selected; then, their outputs are aggregated
4.15. K-Nearest Output Profiles (KNOP) using a weighted majority voting scheme based on their respective
competence level.
The K-Nearest Output Profiles (KNOP) technique [106] works
similarly to the KNORA-U technique, with the difference being that
KNORA-U works in the feature space, while KNOP works in the 4.17. META-DES.Oracle (META-DES.O)
decision space. First, the output profiles’ transformation is applied
over the input xj , giving its output profile x˜ j . Then, the similar- An improvement to the META-DES was proposed in [99]. In this
ity between x˜ j and the output profiles from the dynamic selec- new version of the framework, a total of 15 sets of meta-features
tion dataset, is computed and stored in the set, φ j . Similarly to the were considered. Following that, a meta-features selection scheme
KNORA-U rule, each time a base classifier performs a correct pre- using a Binary Particle Swarm Optimization (BPSO) was conducted
diction, for a sample belonging to φ j , it gains one vote. The votes in order to optimize the performance of the meta-classifier, λ.
obtained by all base classifiers are aggregated to obtain the ensem- The difference between the level of competence estimated by the
ble decision. meta-classifier and that estimated by the Oracle was used as the
fitness function for the BPSO meta-features selection scheme, so
that the difference between the behavior of the meta-classifier and
4.16. META-DES
that of the Oracle in estimating the competence level of the base
classifiers was minimized. The new framework was called META-
The META-DES framework is based on the assumption that the
DES.Oracle since it was based on the Oracle definition.
dynamic ensemble selection problem can be considered as a meta-
Experimental results conducted in [99] demonstrated that the
problem [124]. This meta-problem uses different criteria regarding
META-DES.Oracle dominates the classification results compared
the behavior of a base classifier ci , in order to decide whether it
against previous DES techniques. Its performance is statistically
is competent enough to classify a given test sample xj . The meta-
better when compared to any of the 10 state-of-the-art techniques,
problem is defined as follows [27]:
including the META-DES. This can be explained by two factors:
• The meta-classes are either “competent” (1) or “incompetent” state-of-the-art DES techniques are based only on one criterion to
(0) to classify xj . estimate the competence of the base classifier; this criterion could
• Each set of meta-features fi corresponds to a different criterion be local accuracy, ranking, probabilistic models, etc. In addition,
for measuring the level of competence of a base classifier. through the BPSO meta-features selection scheme, only the meta-
204 R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216
features that are relevant for the given classification problem are 4.19. Dynamic selection in different contexts
selected and used for the training of the meta-classifier λ.
4.19.1. One-Versus-One decomposition (OVO)
Another context where dynamic selection has recently shown a
4.18. Dynamic Selection on Complexity (DSOC) lot of promise is in One-Versus-One (OVO) decomposition strate-
gies [133]. OVO works by dividing a multi-class classification prob-
Brun et al. [123] proposed an interesting dynamic classifier se- lem into as many binary problems as there are combinations be-
lection approach which takes into account data complexity mea- tween pairs of classes [133]. Each base classifier is trained solely to
sures from Ho and Basu [132], together with the local accuracy distinguish between one pair of classes. When a new query sample
estimates of the base classifiers, to perform dynamic selection. is presented for classification, the outputs of all base classifiers are
The proposed system is called Dynamic selection on complexity combined to predict its label. However, since each base classifier
(DSOC). is only trained for a pair of classes, the majority of base classifiers
DSOC aims to select the base classifier ci that presents not only might not even be trained for the corresponding class, and their
a high local performance, but also that was trained using a data decision may hinder the performance of the system. This is called
distribution that presents similar complexity measures, regarding “the non-competent classifier” problem, which is a crucial problem
the shape of the decision boundary and the overlap between the in OVO strategies [133].
classes, extracted from the neighborhood of the query sample xj . Dynamic selection represents an interesting way of solving this
Three complexity measures were considered: The Fisher’s discrimi- non-competent classifier problem, since it provides a methodology
nant ratio (F1), the intra/inter class ratio (N2) and the non-linearity for estimating the competence of the base classifiers on-the-fly,
of the 1-NN classifier (N4). More details about these complexity while thus, avoiding non-competent classifiers which may hinder
metrics are given in [132]. the system decision during the generalization phase. Five dynamic
The base classifier is selected taking into consideration three selection techniques were proposed in this context:
features:
Table 2
Application using DS.
Credit Scoring Bagging and Boosting OLA, LCA, KNORA-E, KNORA-U [9,10]
Customer Classification Different training sets LCA, OLA, KNORA-E [126]
Music Classification Different feature sets KNORA-E, KNORA-U, OLA, LCA [8,98]
Watch list screening Different feature sets Distance based DS [7,43,44]
Face recognition Different feature sets OLA, LCA, Distance based DS [44,45]
Handwriting recognition Bagging KNORA-E, KNORA-U, KNOP, OLA, LCA [26,98,113]
Signature verification Random subspaces KNOP [42,121]
Forest species Different feature sets MCB [137]
Remote sensor images Bagging MLA [32]
Time series forecasting Heterogeneous classifiers MCB [138]
Antibiotic Resistance Bagging LCA [37]
Bioprosthetic-hand Heterogeneous classifiers DES-PRC [139–141]
Classification results demonstrated that the DRCW-OVO-DES-P between simplicity and classification accuracy. Moreover, we can
strategy outperformed the other dynamic selection approaches in see that DES methods are more popular than DCS ones, which
the context of OVO decomposition classification. may account for the many works which point out that DES tech-
niques usually achieve higher classification accuracy. Another in-
4.19.2. One-Class Classification (OCC) teresting fact is that the majority of the related works apply DS
One-class classification (OCC) is one of the most difficult prob- techniques for a pool of classifiers trained using different feature
lems in machine learning. It is based on the assumption that dur- spaces [7,8,43,45,137].
ing the training stage, only objects originating from one class are
present, with no access to counterexamples, making it difficult to 5.1. Credit scoring
train an efficient classifier, since there is no data available to prop-
erly estimate its parameters [35,135]. It can therefore be consid- Credit scoring is one of the most studied problems in pat-
ered as an ill-posed problem. As has been mentioned in several tern recognition. Its difficulty comes from the observation that the
works [27,99,106], dynamic selection techniques outperform strong problem is often heavily imbalanced, since there are way fewer
classification models, such as SVMs when dealing with ill-posed samples from customers with poor credit scores [126]. Moreover,
problems. Therefore, the application of DS in the context of OCC is high accuracy in this problem is very important, since even just 1%
expected to improve generalization performance. in improvement in classification accuracy should greatly increase
As reported in [35], dynamic selection techniques work well in the profits of financial institutions [142].
this context because during the training stage it is possible to gen- Many works consider the use of DS techniques for the credit
erate a diverse pool of classifiers. The flexibility of DS techniques scoring problem [9,10,126]. In [126], the authors proposed a dy-
then allows the most competent classifier to be selected to clas- namic ensemble selection system that takes into account the cost
sify each new test sample. Three One-Class DCS (OCDCS) methods of misclassifications of each class. The proposed approach, called
were proposed by adapting three DCS methods proposed in [136]: Dynamic Classifier Ensemble for Imbalanced Distributions (DCEID),
One-Class Entropy Measure Selection (OCDCS-EM), which is based uses cost-sensitive versions of the LCA and OLA techniques in
on the entropy measure, One-Class Minimum Difference Selec- dealing with the imbalanced nature of this problem. Another re-
tion (OCDCS-MD), and the Fuzzy Competence Selection (OCDCS- cent use of DS techniques for credit scoring was proposed by
FC). These three OCDCS methods are based on the potential func- Xiao et al. [10]. The authors proposed a new ensemble genera-
tion model to weight the decisions obtained by each sample in the tion method to increase the diversity between the members of the
reference set according to its distance to the given query instance. pool. Then, two DCS and two DES schemes were considered: LCA
An experimental evaluation demonstrated that the OCDCS- and OLA as DCS, KNORA-E and KNORA-U as DES methods. Simi-
EM works well for small datasets, while the Fuzzy Competence lar to previous works, the authors concluded that DES techniques
(OCDCS-FC) is the best choice for large datasets. However, only DCS presented the best classification performance.
methods were proposed for OCC. Since DES usually presents better An extensive comparison between DS techniques and other
results than DCS, it is reasonable to think that a next step in this classification schemes considering eight credit scoring datasets was
direction would be to adapt DES techniques for one-class classifi- conducted by Lessmann et al. [9]. However, the dynamic selection
cation problems. techniques considered in their analysis did not improve the classi-
fication accuracy when compared to traditional credit scoring ap-
5. Applications proaches.
In this section, we present a review of real-world applications 5.2. Music genre classification
using dynamic selection techniques. Moreover, we also discuss how
the authors adapt traditional DS techniques to the intrinsic charac- In [8], the authors investigated the use of dynamic ensemble
teristics of their applications; these includes aspects such as im- selection techniques for music genre classification. In their solu-
balanced distributions in customer classification and credit scor- tion, first, a pool of weak classifiers was generated, trained with
ing [126] and the lack of validation samples in face recognition distinct segments of the audio signal and different feature extrac-
applications [7,45]. tion methods. A total of 13 distinct feature extraction methods
Table 2 lists several real-world applications of DS techniques. were considered, each corresponding to a different musical aspect,
Based on usage statistics respecting the use of DS techniques in such as harmony, timbre and rhythm. In the generalization phase,
different applications, the KNORA-E and KNORA-U methods are the the KNORA-E and KNORA-U techniques were considered for the
most commonly used in different applications. This may be ex- classification of each unknown music sample. In the experiments
plained by the fact these techniques represent a good trade-off conducted using the Latin Music Dataset [143], the use of DES
206 R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216
achieved a recognition accuracy of about 70%, which significantly all cases, dynamic selection provided a better classification perfor-
improved the classification performance, as compared to the 54% mance when compared to static ensemble techniques.
seen with of the best single classifier model.
5.3.4. Forest species recognition
5.3. Image recognition Martins et al. [137] used the DS for the forest species recogni-
tion problem. The system was based on multiple features extrac-
5.3.1. Face recognition and watch list screening tion methods, such as texture (Gabor filters and Local Binary Pat-
The face recognition problem presents an interesting use of dy- terns), as well as key point-based features (SIFT and SURF) to gen-
namic selection techniques, since in such applications, we do not erate a diverse pool of classifiers. Then, several static and dynamic
have enough data available to have a dynamic selection dataset selection methods were evaluated. The MCB technique presented
(DSEL). Thus, the authors in [7,43] adapted the dynamic selection the best result, achieving a 93.03% accuracy.
techniques considering just the query sample and a pool of SVM
classifiers trained using different feature extraction methods. The
5.4. Time series forecasting
algorithm is based on the distance between the query sample and
the support vectors of the SVM classifier for the negative samples
Sergio et al. [138] proposed a dynamic selection of regressors
(i.e., face images belonging to different users). The higher the dis-
for the time series forecasting. An adaptation of the MCB tech-
tance between the support vectors, the more competent the SVM-
nique for regression was used to perform the dynamic selection
Feature Extraction pair is. Therefore, the system is not only select-
steps. The proposed system was called Dynamic Selection of Fore-
ing the SVM, but also choosing which feature extraction method is
cast Combiners (DS-FC). The pool of classifiers was composed of
more suitable for the classification of an unknown face image.
four regression models: a feed-forward neural network with one
hidden layer, a feed-forward neural network with two hidden lay-
5.3.2. Handwriting recognition
ers, a deep belief network (DBN) with two hidden layers and a
Dynamic ensemble and classifier selection techniques have
support vector machine for regression (SVR).
been used to solve several image recognition problems. Ko
The proposed DS-FC was used to forecast eight time series with
et al. [26] used the KNORA-E and KNORA-U techniques for the NIST
chaotic behavior considering short and long term series. The pro-
SD19 handwritten recognition dataset. Although the performance
posed dynamic selection scheme outperformed static combination
of those techniques did not present the highest classification per-
schemes in six out of eight time series. Moreover, it also presented
formance, their classification results were among the best achieved
better results than most of the state-of-the-art time series forecast-
for this problem to that point. In addition, the proposed KNORA
ing techniques.
techniques outperformed static ensemble selection schemes such
as GA and MVE [46].
Furthermore, in [113], the authors applied the KNOP tech- 5.5. Biomedical
nique for four handwriting recognition datasets: NIST Letters, NIST
Digits, Japanese Vowels, and Arabic Digits. Experiments demon- 5.5.1. Antibiotic resistance
strated that the proposed approach outperformed the state-of- Tsymbal et al. [37] proposed a dynamic ensemble method to
the art methods for the Arabic and Japanese datasets. The ac- tackle the problem of antibiotic resistance. This is a typical ex-
curacy achieved in the NIST Letters and Digits datasets were ample of a changing environment (concept drift), where pathogen
comparable to the state-of-the-art results obtained by SVM and sensitivity may change over time as new pathogen strains de-
MLP classifiers [144,145]. However, in another publication, Cavalin velop resistance to previously used antibiotics. A dynamic classifier
et al. [106] demonstrated that the use of dynamic selection out- method was proposed to deal with this problem, with the authors
performs both SVM and MLP classifiers when the training dataset considering a variation of the LCA technique, in which the distance
is small (less than 50 0 0 examples). between the neighbors are also taken into account.
Three dynamic approaches were considered: Dynamic Voting
5.3.3. Signature verification (DV), Dynamic Selection (DS) and Dynamic Voting with Selec-
Batista et al. [42] evaluated four dynamic selection techniques tion (DVS). The methodology was evaluated considering gradual
for the signature verification problem: KNORA-UNION, KNORA- and abrupt drift scenarios. Experimental results demonstrated that
ELIMINATE and their corresponding versions using output profile, the approaches using dynamic selection presented the best per-
namely, OP-ELIMINATE and OP-UNION. The system was based on formance. Furthermore, the dynamic approaches always obtained
an ensemble of Hidden Markov Models (HMMs) used as feature a better result than the best base classifier model and ensemble
extractors from the signature image. Each HMM was trained us- technique.
ing different numbers of states and codebook sizes in order to
learn signatures from different levels of perception. The features 5.5.2. Bioprosthetic hand
extracted using HMMs were merged in a feature vector. For each Kurzynski et al. [139,147] proposed a dynamic ensemble selec-
writer, the random subspace method was used to train a pool of tion system for the recognition of electromyography (EMG) signals
100 Gaussian SVM classifiers. for the control of a bioprosthetic hand. The proposed solution was
The proposed approach was applied to two well-known signa- based on the estimation of the classifiers competence using the
ture verification datasets: GPDS [146] and Brazilian signature com- probabilistic randomized reference classifier proposed in [25]. The
posed of random, simple and skilled forgeries. Experimental re- pool of classifiers consisted of seven different classification mod-
sults demonstrated that dynamic selection can significantly reduce els, including linear and quadratic discriminant classifiers (LDC and
the overall error rates as compared to other combination methods. QDC) and MLP neural networks.
In the majority of the experiments conducted, the OP-ELIMINATE Moreover, in [140,141], the authors proposed a method for the
presented the best performance. It must be noted however, that control of a bioprosthetic hand using a two-stage MCS and DES for
the OP-UNION method worked better when the SVM classifiers the recognition of EMG and mechanomiographic (MMG) signals in-
were trained with a limited number of signature samples, since dicating the patient’s movement intention. Additionally, feedback
classifiers trained with fewer signatures are less accurate. Thus, information coming from bioprosthesis sensors was used to cali-
more classifiers are needed to form a robust EoC. Nevertheless, in brate the competence of the base classifiers estimated using the
R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216 207
Table 3
Summary of the 30 datasets used in the experiments.
RRC technique during the operation phase. The two-stage tech- Nearest Output Profiles (KNOP) [106,113], Dynamic Ensemble Se-
nique developed provided the state-of-the-art results for control lection Performance (DES-P) [34], Dynamic Ensemble Selection
of the bioprosthetic hand, considering several types of movements, Kullback–Leibler (DES-KL) [34], DES Clustering [30], DES KNN [30],
such as hook, power and pinch. Meta Learning for Dynamic Selection (META-DES) [27] and META-
DES.Oracle [99]. All the DS methods considered in this evaluation
6. Comparative study are detailed in Section 4 and summarized in Table 1.
Fig. 5. Average rank of the 18 dynamic selection methods over the 30 datasets. The best algorithm is the one presenting the lowest average rank. Techniques in which the
difference in average ranks is lower than the critical difference are connected by a black bar.
Table 4 ranks is lower than the critical difference are connected by a black
Overall results considering the 30 classification datasets. The average
bar (i.e., the results are statistically equivalent according to the
ranks and accuracy for each DS technique are presented. Standard
deviation is presented in parenthesis. ranking analysis).
Based on the ranking analysis, we can clearly see that DES tech-
DS method Avg. rank DS method Mean accuracy
niques outperform DCS ones. Among the top 10 techniques, 8 are
META-DES.O 3.87(3.54) META-DES.O 83.92(9.13) DES. The only DES methods that did not achieve a lowest aver-
META-DES 4.17(2.98) META-DES 83.24(8.94) age rank in comparison with the DCS ones were the DES-KNN and
DES-RRC 5.97(4.66) DES-P 82.26(9.26)
DES-Clustering. Interestingly, these two techniques take into ac-
KNORA-U 6.83(4.11) DES-RRC 82.11(8.76)
DES-P 7.13(3.69) KNORA-U 81.69(9.82) count diversity measures to increase the diversity in the EoC, af-
DES-KL 7.73(4.92) DES-KL 81.52(8.77) ter the most competent classifiers are selected. This result may in-
KNOP 9.53(3.98) KNOP 80.81(8.92) dicate that adding diversity to EoC does not provide classification
KNORA-E 9.77(3.88) KNORA-E 80.36(10.75)
benefits at the instance level, i.e., for the classification of a single
LCA 10.10(4.66) OLA 79.87(10.67)
OLA 10.40(4.95) DCS Rank 79.69(10.38) instance. As reported in [156], we should promote the consensus
MCB 11.17(4.74) LCA 79.57(9.84) in the EoC, for the classification of a single query xj , rather than
DES-KNN 11.17(4.40) MCB 79.56(9.70) diversity. Our experimental analysis supports this hypothesis.
DSOC 11.37(5.74) DSOC 79.33(9.44) The six techniques with the lowest average rankings (DES-
A Posteriori 11.47(5.56) DES-KNN 79.29(10.23)
P, DES-KL, KNORA-U, DES-RRC, META-DES and META-DES.Oracle)
DCS Rank 11.80(4.20) A Priori 78.57(11.18)
DES-KMEANS 12.73(3.84) DES-KMEANS 78.49(10.40) were considered equivalent in this analysis. However, one problem
MLA 12.80(4.60) A Posteriori 78.14(11.53) with the ranking analysis is that the result of the comparison be-
A Priori 13.00(4.53) MLA 77.34(9.78) tween two techniques changes according to the other techniques
that were considered in the test. This problem can cause several
type I errors according to [157]. For this reason, we performed a
the experiments conducted by Soares et al. [30]. Lastly, the values new test considering only these top 6 techniques. The CD diagram
of the hyper-parameters Kp and hc for the META-DES framework considering only the top 6 techniques is shown in Fig. 6. Moreover,
were set to 5 and 80% according to the results presented in [27]. the classification results obtained for the 30 datasets by the top 6
Moreover, the BPSO optimization scheme of the META-DES.Oracle DS techniques are presented in Table 5.
was conducted using the V-Shaped transfer function [99,153]. Furthermore, for a better comparison of the top DS techniques,
we conducted two additional analysis: an n × n comparison consid-
6.3. Comparison of dynamic selection techniques ering the hypothesis of equality between all existing pairs of algo-
rithms using Bergman–Hommel procedure [158–160], and a pair-
The Friedman rank [154] test was used for the statistical com- wise test using the Wilcoxon Sign Test as recommended by Be-
parison of the dynamic selection techniques over the 30 classifi- navoli et al. [157]. The results of both tests for each pairwise com-
cation datasets. The average ranks were calculated as follows: For parison are presented in Table 6. Hypotheses that are rejected at a
each dataset, the method that achieved the best performance re- α = {0.1, 0.05, 0.01} are marked with a •, ••, and •••, respectively.
ceived rank 1, the second best rank 2, and so forth. In case of a The Bergmann–Hommel test was conducted using the JAVA code
tie, i.e., two methods presented the same classification accuracy for published by Garcia and Herrera [159]4 .
the dataset, their average ranks were summed and divided by two. Based on the results, we can see that the two versions of
The average rank was then obtained, considering all datasets. The the META-DES framework and the Randomized Reference Classifier
best performing algorithm was the one presenting the lowest av- (DES-RCC) presented the best results. According to the Bergmann–
erage rank. Next, the critical difference (CD) value was calculated Hommel test, these three techniques are statistically equivalent.
using the Bonferonni–Dunn post-hoc test recommended in [155]. However, the Wilcoxon Sign test shows that both the META-DES
The performance of two techniques was deemed statistically dif- and the META-DES.Oracle outperform the DES-RRC method, with a
ferent when their difference in average rank was higher than the level of significance α = 0.01.
critical difference CD. Table 4 presents the average accuracy of all
DS methods as well as their average ranking.
We use the critical difference diagram proposed in [155] in or-
der to have a visual illustration of the statistical test. The CD di-
agram with the results of the Bonferonni–Dunn post-hoc test is
shown in Fig. 5. Techniques in which the difference in average 4
Code available at http://sci2s.ugr.es/keel/multipleTest.zip.
R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216 209
Fig. 6. Average rank of the top six dynamic selection methods over the 30 datasets.
Table 5
Mean and standard deviation results for the top 6 DS techniques. The best results for each dataset are highlighted
in bold.
6.4. Comparison with different classification approaches those of the best classifier in the pool, Single Best (SB); the se-
lection of the best base classifiers in the pool, Static Selection (SS),
In this section, we compare the results obtained by DS tech- and the majority voting combination of all classifiers in the pool,
niques against monolithic classifier models. The objective of this Majority Vote (MV). We also included these three methods since
study is to determine whether the performance of DS methods in they are extensively used as baseline methods in the dynamic se-
relation to the best off-the-shelf classifiers. Three single classifier lection literature.
models were considered: Multi-Layer Perceptron (MLP) Neural Net- All classifiers were evaluated using the Matlab PRTOOLS tool-
work, Support Vector Machine with Gaussian Kernel (SVM) and K- box [152]. The dynamic selection dataset (DSEL) was used as the
Nearest Neighbor classifier. As ensemble methods, we considered validation set in the training process of the classifiers, and as a
the Random Forest [63] and Adaboost [52] techniques. These clas- result, all methods were trained using the same amount of data
sifiers were selected based on a recent study [4] that ranked the available. The distribution of the test set remained the same. The
best classifiers in a comparison considering a total of 179 classi- hyper-parameters of the classifiers were set as follows:
fiers over 121 classification datasets.
Furthermore, as reported by Britto et al. [24], usually, the per- 1. Single Best (SB): The base classifier with the highest classifica-
formances of dynamic selection techniques are compared with tion accuracy in the validation set is selected for classification.
210 R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216
Table 6
Pairwise comparison of the top six DS techniques. (a) Comparison with the adjusted p-values calculated using the Bergmann-Hommel procedure. (b)
Pairwise comparison using the Wilcoxon Sign-test. The hypothesis are ordered in an ascending order according to the p-value. Hypothesis that are
rejected at a α = {0.1, 0.05, 0.01} are marked with a •, ••, ••• respectively.
(a) (b)
2. Majority Voting (MV): The outputs of all base classifiers in the Table 7
Overall results considering the 30 classification datasets. The aver-
pool are combined using the majority voting rule [79].
age ranks and accuracy for each algorithm is presented. Standard
3. Static Selection (SS): A GA ensemble selection approach based deviation is presented in parenthesis.
on the majority voting accuracy presented in [46]. The param-
Algorithm Avg. Rank Algorithm Accuracy
eters of the GA method were set according to [46]. The valida-
tion set was used for the computation of the majority voting META-DES.O 5.43(4.92) META-DES.O 83.92(9.13)
accuracy. META-DES 5.70(4.28) META-DES 83.24(8.94)
DES-RRC 7.67(6.23) DES-P 82.26(9.26)
4. AdaBoost: We set the number of iterations of the algorithms to
DES-P 9.17(5.27) SVM 82.22(10.24)
100. The Perceptron classifier was used as the weak model. The KNORA-U 9.33(6.40) DES-RRC 82.11(8.76)
Multi-Class Adaboost [161] was used for the multi-class prob- DES-KL 9.90(6.42) KNORA-U 81.69(9.82)
lems. SVM 11.07(8.14) DES-KL 81.52(8.77)
KNOP 13.07(5.86) KNOP 80.81(8.92)
5. Random Forest (RF): The number of trees was set to 200. The
KNORA-E 13.23(5.62) RF 80.78(10.98)
number of leaves was set to the square root of the number of RF 13.77(9.50) KNORA-E 80.36(10.75)
features as recommended in [64,162,163]. LCA 14.30(6.42) OLA 79.87(10.67)
6. Multi-Layer Perceptron (MLP): We varied the number of neu- OLA 14.60(7.02) DCS Rank 79.69(10.38)
rons in the hidden layer from 10 to 100 at 10 point intervals. MV 14.93(6.62) LCA 79.57(9.84)
SS 14.97(6.38) MCB 79.56(9.70)
The configuration that achieved the best results in the valida-
MCB 15.03(7.49) MV 79.51(9.39)
tion data was used. The MLP training process was conducted AdaBoost 15.43(7.63) SS 79.40(10.12)
using the Levenberg–Marquadt algorithm [164]. The training DES-KNN 15.53(6.49) DES-KNN 79.29(10.23)
was stopped if the performance on the validation set decreased DCS Rank 16.33(5.77) AdaBoost 79.23(10.32)
A Posteriori 16.40(7.91) MLP 79.20(11.74)
or failed to improve for five consecutive epochs (early stop-
SB 16.47(6.04) SB 79.06(9.98)
ping). DSOC 16.87(7.93) DSOC 79.00(9.44)
7. Support Vector Machine with a Gaussian Kernel (SVM): A grid MLP 16.90(8.45) A Priori 78.57(11.18)
search was performed in order to set the values of the regular- 7-NN 17.40(8.59) DES-KMEANS 78.49(10.40)
ization parameter, c, and the Kernel spread parameter γ . DES-KMEANS 17.50(6.13) A Posteriori 78.14(11.53)
MLA 18.20(7.41) 7-NN 77.42(13.06)
8. K-Nearest Neighbors (K-NN): For the K-Nearest Neighbors clas-
A Priori 18.30(6.24) MLA 77.34(9.78)
sifier, we considered a neighborhood size K = 7 so that the 1-NN 20.50(8.10) 1-NN 76.64(11.98)
same neighborhood size was used for both the K-NN and the
DS techniques. In addition, we also considered the performance
of the 1-NN as the baseline for this method.
Table 7 presents the average accuracy of all classification meth- the use of a pool of SVM classifiers, such as in the following
ods, as well as their average ranking. In comparison to the baseline works [7,25,30,34,156]. It is also possible to train a Random Forest,
methods (SB, SS and MV), we can see that the majority of the DS and apply dynamic selection for classification, instead of Majority
techniques improve upon the SB (only the MLA presented both a Voting.
lower ranking and average accuracy). With respect to SS and MV, Moreover, the hyper-parameters of the SVM classifier were op-
66% of the DS considered presented better results. Furthermore, timized for each dataset, while for the DS techniques, the val-
the top DS techniques (Table 5) also presented a higher average ac- ues of the hyper-parameters were set based on previous publica-
curacy and a better ranking when compared to the Random Forests tions [26,102]. An optimization of the hyper-parameter (e.g., neigh-
and AdaBoost techniques, which are classical static ensemble tech- borhood size K for the DS methods based on the KNN), as well
niques. as the evaluation of DS techniques using a different base classi-
The SVM and RF classifier presented very high recognition ac- fier model such as SVM, could further illustrate the benefits of DS
curacy. However, it must be pointed out that the DS techniques in techniques. The use of techniques to estimate the best number of
this paper were all evaluated using a pool of weak, linear classi- classifiers in the pool, such as in [165], could also be employed for
fiers. All methods considered in this work could also benefit from improving the classification performance of DS algorithms.
R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216 211
may not be very much relevant for dynamic selection schemes. whether or not the query sample is located in region with border-
Based on this analysis, the authors propose a new metric, called line samples of different classes (called indecision region). In the
the Hit-rate, which takes into account the local information from context of DS, a sample is located in an indecision region when
the DS methods. They argue that the Hit-rate should be used in- its region of competence contains instances from different classes.
stead of the Oracle, as it covers both local and global information The authors demonstrated that in such cases, many DS techniques
regarding the given pool of classifiers. can select classifiers with decision boundaries that do not cross
the region of competence, assigning all samples in the region of
competence to the same class. This may cause problems specially
7.2. Pool generation
when dealing with imbalanced datasets, in which the majority of
the samples in the region of competence belong to a single class
In the majority of DS publications, the pool of classifiers is
(majority class).
generated using either well known ensemble generation methods
In order to deal with this problem, an online pruning frame-
such as Bagging, or by using heterogeneous classifiers [25,34]. The
work was proposed in [109], which pre-selects classifiers with de-
problem with such generation approaches is that they were pro-
cision boundaries that crosses the region of competence of the test
posed for static combination methods. In other words, they use a
instance, when the test instance is located in an indecision region.
global approach in generating the base classifiers. Since these tech-
The first step of the framework is to determine whether or not
niques look at the problem globally rather than locally, they do
query instance is located in an indecision region. If yes, the prun-
not guarantee the presence of local experts. For this reason, the
ing mechanism is employed to pre-select the classifiers that cross
DS methods may not be able to select the competent classifiers lo-
the region of competence. Then, a DS technique is applied to se-
cally [166]. To the best of our knowledge, there is no classifier gen-
lect the most competent classifiers among the pre-selected pool. If
eration procedure that is adapted to dynamic selection techniques.
the query is located in a safe region, the DS technique is used for
We believe that the definition of a classifier generation procedure
classification. The proposed online pruning framework significantly
that takes into account the local information is a very promising
improved the classification accuracy of the 9 DS techniques consid-
research direction in order to improve the performance of all DS
ered in the experimental analysis. This result demonstrates that in
techniques.
order to be considered competent, a base classifier should not only
Another research direction in terms of the generation of a pool
obtain a high competence level, estimated by the DS method, but
of classifiers concerns the pool size. Normally, a large pool of clas-
must also cross the region of competence in the case of indecision
sifiers is considered when DS techniques are evaluated. For in-
regions.
stance, in [27,123,167] a pool composed of 100 base classifiers was
However, one of the problems pointed out in this work is the
used, while in [156], a pool composed of 10 0 0 SVMs was em-
fact that some samples were located in an indecision region, and
ployed. An interesting work conducted by Roy et al. [165,168] pro-
had no base classifiers with decision boundaries crossing its region
posed a meta-regression model to predict the best size of the
of competence, which lead us to think that an ensemble generation
pool of classifiers based on complexity measures extracted from
technique that maximizes the number of base classifiers with deci-
the classification problem. Results demonstrate that DS usually
sion boundaries crossing the indecision regions as a very promising
presents better classification results using an average 20 base clas-
research direction. That also supports our hypothesis that, when
sifiers. Hence, using smaller pools of classifiers can not only im-
working with DS, we need an ensemble generation method that
prove the accuracy, but also, reduce the computational costs in-
generates local experts rather than global ones.
volved in DS techniques.
7.3. Region of competence definition 7.4. Prototype selection and generation for DS
As reported in [108] and [107], the definition of the region of The rationale for using PS techniques is that the performance of
competence plays a very important role in the classification perfor- DS techniques is very dependent on the distribution of DSEL. When
mance of a DS system since the local competence of the base clas- the samples in this set are not representative enough of the query
sifiers is estimated based on the samples belonging to this region. sample, the DS technique may not select the most competent clas-
Hence, we believe an interesting research direction is to study the sifiers to predict its label. This phenomenon may occur as a result
relationship between the samples belonging to the region of com- of a high degree of overlap between different classes or may be
petence and the selection of the base classifiers; in other words, due to the presence of noise [108]. Another important aspect of
how the distribution of the region of competence changes the way editing the distribution of DSEL is that it can also significantly re-
the competence of the base classifiers is estimated. This relation- duce the computational complexity involved in applying dynamic
ship can be used in defining new ways of demarcating the region selection techniques, since the definition of the region of compe-
of competence and selecting classifiers, and additionally should be tence is conducted using the K-NN technique, which can be very
taken into account during the classifier generation procedure. costly when dealing with large datasets.
Furthermore, the majority of the dynamic selection techniques Recent works have pointed out that the use of Prototype Selec-
use a fixed neighborhood size. This value of K is often used for tion (PS) [169] techniques can significantly improve the classifica-
multiple classification problems, regardless of their complexity. An- tion accuracy of several dynamic selection techniques [104,107]. In
other interesting future work would involve the prediction of the this case, the PS techniques are applied to edit the distribution of
best K value according to the specificity of the complexity of DSEL in the training stage. The edited DSEL, denoted by DSEL , is
the classification problem, or, having a variable neighborhood size then used for extracting the regions of competence during the gen-
which could also change dynamically, based on the location of the eralization phase. In [104], we evaluated the impact of six Proto-
query sample in the feature space. In this case, one could for in- type Selection techniques in relation to the classification accuracy
stance, have a higher K value for samples that are closer to the and reduction of computational time of several dynamic selection
decision border in order to reduce the influence of noisy samples. techniques. The experimental analysis demonstrated that PS tech-
Another interesting aspect regarding the region of competence niques, such as the Relative Neighborhood Graph, significantly im-
is the work conducted by Oliveira et al. [109], in which the authors prove the classification performance of all DS methods considered
studied the estimated regions of competence in order to know in the analysis. Moreover, it can also significantly reduce the size
R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216 213
of DSEL and improve the recognition performance of several DS 7.6. Cost-sensitive dynamic selection
techniques.
However, as reported in [104], the techniques that present the In many real-world classification tasks, such as medical diagno-
best results for DS are not the ones that obtained the best clas- sis and credit analysis, it is crucial to take into account misclas-
sification accuracy when the 1-NN is considered as the classifi- sification costs associated with each class. Furthermore, different
cation scheme. For instance, the Generational Genetic Algorithm classification costs can stem from the imbalanced nature of some
(GGA) and the CHC Adaptive Search Algorithm obtained the worse classification problems. In other words, due to the imbalanced na-
performance in the experimental study with DS techniques; how- ture of the problem, the system may require different costs for the
ever, they were among the top 5 best performing algorithms when majority and minority classes.
the 1-NN was considered as the classification scheme [169]. The Recently, several cost-sensitive ensemble approaches have been
performance obtained by the CHC method was significantly worse proposed, such as cost-sensitive trees ensemble [173], boost-
when compared to the other PS techniques, as well as the baseline ing [174] and the cost-sensitive ensemble methods based on the
result (i.e., the system without using PS). ROC space [175,176]. However, these approaches are all based on
This may be attributable to the fact that these techniques use static ensembles. To date the dynamic selection literature consid-
the classification accuracy of the 1-NN technique in the fitness ers all classes with the same cost, and no rejection mechanism has
function for editing the dataset. An interesting direction for future been proposed for DS [24]. Classification systems that deal with
work would involve the use of the performance of DS techniques, such applications often require a built-in rejection mechanism to
(e.g., the accuracy of the OLA technique) as a criterion within those avoid committing errors in very risky predictions. Thus, we believe
techniques in order to adapt the distribution of DSEL for the use of that a definition of cost-sensitive DCS and DES techniques is an-
DS techniques, rather than the 1-NN classifier. Moreover, the use other promising research direction for dynamic selection.
of Prototype Generation techniques [170] should also be evaluated
for improving the performance of DS techniques, especially when 7.7. Imbalanced datasets
dealing with imbalanced distributions [107].
Imbalanced learning has recently attracted much attention from
the pattern recognition community since this kind of data is very
common in real-world applications, e.g., biomedical data and spam
detection. Classification in the presence of class imbalance is chal-
lenging since the usual method of training and selecting standard
7.5. Diversity for DS classification models is based on classification accuracy. However,
if we take into account the classification accuracy in such cases,
One of the most studied aspects in MCS is the concept of diver- the minority class could be totally ignored.
sity. It is known that we need diversity in the classifier ensemble, In our opinion, dynamic selection techniques can bring many
since combining classifiers that always produce the same decision benefits to this type of problem since they perform a local classi-
will not improve the recognition rate of the system. Diversity is of- fication. The selection of the ensemble of classifiers is performed
ten measured by the difference in the classifiers’ decision, such as taking into account only the neighborhood of the query sample,
in the Q-statistics and in the disagreement measures [131]. rather than the whole dataset. Thus, we believe that the classifier
To date, few dynamic selection techniques have utilized diver- selection scheme will not be biased towards the majority class.
sity together with different competence measures to perform en- To the best of our knowledge, there is just one publication that
semble selection [30,171]. However, the two DS techniques that discusses the use of dynamic selection for imbalanced distribu-
take into account diversity information did not present a good tions [126]. However, the proposed system was only applied to two
overall performance in our experimental analysis (as shown in credit scoring datasets, which is not enough to evaluate whether or
Section 6). not DS can cope with class imbalance. In addition, this paper did
In our opinion, when dynamic selection is considered, a diverse not take into consideration the use of data preprocessing such as
pool of classifiers is required, since, intuitively, a pool of diverse SMOTE and RAMO [177]. Hence, another interesting future work
classifiers means that there are several classifiers specialized in would involve the evaluation of DS techniques for imbalanced dis-
different regions of the feature space. Consequently, the classifier tributions, and possibly, the definition or adaptation of DS tech-
pool has a better coverage of the whole feature space [172]. How- niques for this kind of application.
ever, after selecting the EoC, we believe that we need to promote
consensus, rather than diversity, among the selected classifiers. In-
Acknowledgment
tuitively, the non-competent classifiers, for the corresponding local
region, will present high diversity when compared to the compe-
This work was supported by the Natural Sciences and Engineer-
tent ones, since they performed differently at the local level. The
ing Research Council of Canada (NSERC), the École de technologie
addition of diverse classifiers at the ensemble level can hinder the
supérieure (ÉTS Montréal) and CNPq (Conselho Nacional de Desen-
EoC decision. Moreover, if there is no consensus among the se-
volvimento Científico e Tecnológico).
lected base classifiers, the system may end up randomly selecting
the class of the query instance. This point is discussed by Sağlam
References
and Street in a recent publication [156], who evaluate the concept
of distant diversity.
[1] L.I. Kuncheva, A theoretical study on six classifier fusion strategies, IEEE Trans.
Another important point to be investigated is the impact of di- Pattern Anal. Mach. Intell. 24 (2) (2002) 281–286.
versity for dynamic ensembles, rather than static ones. The analy- [2] T.G. Dietterich, Ensemble methods in machine learning, in: International
Workshop on Multiple Classifier Systems, Springer, 20 0 0, pp. 1–15.
sis conducted in [70,131] considered only static combination rules
[3] L.I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms, Wi-
(e.g., Average, Product and Majority voting). In the case of DS tech- ley-Interscience, 2004.
niques, the impact of diversity could be analyzed at the pool level, [4] M. Fernández-Delgado, E. Cernadas, S. Barro, D. Amorim, Do we need hun-
i.e., before selecting the base classifiers, as well as at the instance dreds of classifiers to solve real world classification problems? J. Mach. Learn.
Res. 15 (2014) 3133–3181.
level, i.e., after the base classifiers are selected for the classification [5] D. Opitz, R. Maclin, Popular ensemble methods: an empirical study, J. Artif.
of the query. Intell. Res. 11 (1999) 169–198.
214 R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216
[6] R. Polikar, Ensemble based systems in decision making, IEEE Circuits Syst. [37] A. Tsymbal, M. Pechenizkiy, P. Cunningham, S. Puuronen, Dynamic integration
Mag. 6 (3) (2006) 21–45. of classifiers for handling concept drift, Inf. Fus. 9 (1) (2008) 56–68.
[7] S. Bashbaghi, E. Granger, R. Sabourin, G. Bilodeau, Dynamic selection of exem- [38] W. Qu, Y. Zhang, J. Zhu, Q. Qiu, Mining multi-label concept-drifting data
plar-svms for watch-list screening through domain adaptation, in: Proceed- streams using dynamic classifier ensemble, in: Asian Conference on Machine
ings of the 6th International Conference on Pattern Recognition Applications Learning, Springer, 2009, pp. 308–321.
and Methods (ICPRAM), 2017, pp. 738–745. [39] I. Mendialdua, J. Martínez-Otzeta, I. Rodriguez-Rodriguez, T. Ruiz-Vazquez,
[8] P.R.L. de Almeida, E.J. da Silva Júnior, T.M. Celinski, A. de Souza Britto, L.E.S. de B. Sierra, Dynamic selection of the best base classifier in one versus one,
Oliveira, A.L. Koerich, Music genre classification using dynamic selection of Knowl. Based Syst. 85 (2015) 298–306.
ensemble of classifiers, in: Systems, Man, and Cybernetics (SMC), 2012 IEEE [40] M. Galar, A. Fernández, E. Barrenechea, H. Bustince, F. Herrera, Dynamic clas-
International Conference on, IEEE, 2012, pp. 2700–2705. sifier selection for one-vs-one strategy: avoiding non-competent classifiers,
[9] S. Lessmann, B. Baesens, H.-V. Seow, L.C. Thomas, Benchmarking Pattern Recognit. 46 (12) (2013) 3412–3424.
state-of-the-art classification algorithms for credit scoring: an update of [41] Z.-L. Zhang, X.-G. Luo, S. García, J.-F. Tang, F. Herrera, Exploring the effective-
research, Eur. J. Oper. Res. 247 (1) (2015) 124–136. ness of dynamic ensemble selection in the one-versus-one scheme, Knowl.
[10] H. Xiao, Z. Xiao, Y. Wang, Ensemble classification based on supervised clus- Based Syst. 125 (2017) 53–63.
tering for credit scoring, Appl. Soft Comput. 43 (2016) 73–86. [42] L. Batista, E. Granger, R. Sabourin, Dynamic selection of generative–discrim-
[11] M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, F. Herrera, A review inative ensembles for off-line signature verification, Pattern Recognit. 45 (4)
on ensembles for the class imbalance problem: bagging-, boosting-, and hy- (2012) 1326–1340.
brid-based approaches, IEEE Trans. Syst. Man. Cybern. Part C 42 (4) (2012) [43] S. Bashbaghi, E. Granger, R. Sabourin, G.-A. Bilodeau, Robust watch-list
463–484. screening using dynamic ensembles of svms based on multiple face repre-
[12] C. Porcel, A. Tejeda-Lorente, M. Martínez, E. Herrera-Viedma, A hybrid rec- sentations, Mach. Vis. Appl. (2017) 1–23.
ommender system for the selective dissemination of research resources in a [44] S. Bashbaghi, E. Granger, R. Sabourin, G.-A. Bilodeau, Dynamic ensembles of
technology transfer office, Inf. Sci. 184 (1) (2012) 1–19. exemplar-svms for still-to-video face recognition, Pattern Recognit 69 (2017)
[13] M. Jahrer, A. Töscher, R. Legenstein, Combining predictions for accurate 61–81.
recommender systems, in: Proceedings of the 16th ACM SIGKDD Interna- [45] C. Pagano, Adaptive classifier ensembles for face recognition in video-surveil-
tional Conference on Knowledge Discovery and Data Mining, ACM, 2010, lance, Ph.D. thesis, École de technologie supérieure, 2015.
pp. 693–702. [46] D. Ruta, B. Gabrys, Classifier selection for majority voting, Inf. Fus. 6 (1)
[14] D. Di Nucci, F. Palomba, R. Oliveto, A. De Lucia, Dynamic selection of classi- (2005) 63–81.
fiers in bug prediction: an adaptive method, IEEE Trans. Emerg. Topics Com- [47] R.P.W. Duin, The combining classifier: to train or not to train?, Proceedings of
put. Intell. 1 (3) (2017) 202–212. the 16th International Conference on Pattern Recognition 2 (2002) 765–770.
[15] A. Panichella, R. Oliveto, A. De Lucia, Cross-project defect prediction models: [48] L.K. Hansen, P. Salamon, Neural network ensembles, IEEE Trans. Pattern Anal.
L’union fait la force, in: Software Maintenance, Reengineering and Reverse Mach. Intell. 12 (10) (1990) 993–1001.
Engineering (CSMR-WCRE), 2014 Software Evolution Week-IEEE Conference [49] S.-B. Cho, J.H. Kim, Combining multiple neural networks by fuzzy integral for
on, IEEE, 2014, pp. 164–173. robust classification, IEEE Trans. Syst. Man Cybern. 25 (2) (1995) 380–384.
[16] G. Giacinto, F. Roli, L. Didaci, Fusion of multiple classifiers for intrusion detec- [50] L. Breiman, Bagging predictors, Mach. Learn. 24 (1996) 123–140.
tion in computer networks, Pattern Recognit. Lett. 24 (12) (2003) 1795–1803. [51] M. Skurichina, R.P.W. Duin, Bagging for linear classifiers, Pattern Recognit. 31
[17] G. Giacinto, R. Perdisci, M. Del Rio, F. Roli, Intrusion detection in computer (1998) 909–930.
networks by a modular ensemble of one-class classifiers, Inf. Fus. 9 (1) (2008) [52] Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learn-
69–82. ing and an application to boosting, in: Proceedings of the Second European
[18] B. Krawczyk, L.L. Minku, J. Gama, J. Stefanowski, M. Woźniak, Ensemble learn- Conference on Computational Learning Theory, 1995, pp. 23–37.
ing for data stream analysis: a survey, Inf. Fus. 37 (2017) 132–156. [53] J. Feng, L. Wang, M. Sugiyama, C. Yang, Z.-H. Zhou, C. Zhang, Boosting and
[19] L.I. Kuncheva, Classifier ensembles for changing environments (2004) 1–15. margin theory, Front. Electr. Electron. Eng. 7 (1) (2012) 127–133.
[20] R. Polikar, L. Udpa, S. Udpa, S. Member, S. Member, V. Honavar, Learn++: an [54] A. Rahman, B. Verma, Novel layered clustering-based approach for generating
incremental learning algorithm for supervised neural networks, IEEE Trans. ensemble of classifiers, IEEE Trans. Neural Netw. 22 (5) (2011) 781–792.
Syst. Man Cybern. (C), Special Issue on Knowledge Management 31 (2001) [55] R.M. O. Cruz, G.D. C. Cavalcanti, T.I. Ren, An ensemble classifier for offline cur-
497–508. sive character recognition using multiple feature extraction techniques, Pro-
[21] M. Wozniak, M. Graña, E. Corchado, A survey of multiple classifier systems as ceedings of the International Joint Conference on Neural Networks (2010a)
hybrid systems, Inf. Fus. 16 (2014) 3–17. 744–751.
[22] L. Rokach, Ensemble-based classifiers, Artif. Intell. Rev. 33 (1) (2010) 1–39. [56] R.M. O. Cruz, G.D. C. Cavalcanti, T.I. Ren, Handwritten digit recognition us-
[23] Y. Ren, L. Zhang, P.N. Suganthan, Ensemble classification and regression-recent ing multiple feature extraction techniques and classifier ensemble, in: 17th
developments, applications and future directions, IEEE Comput. Intell. Mag. 11 International Conference on Systems, Signals and Image Processing, 2010,
(1) (2016) 41–53. pp. 215–218.
[24] A.S. Britto, R. Sabourin, L.E.S. de Oliveira, Dynamic selection of classifiers - a [57] A. Rahman, B. Verma, Effect of ensemble classifier composition on offline cur-
comprehensive review, Pattern Recognit. 47 (11) (2014) 3665–3680. sive character recognition, Inf. Process. Manage. 49 (4) (2013) 852–864.
[25] T. Woloszynski, M. Kurzynski, A probabilistic model of classifier competence [58] A. Schindler, R. Mayer, A. Rauber, Facilitating comprehensive benchmarking
for dynamic ensemble selection, Pattern Recognit. 44 (2011) 2656–2668. experiments on the million song dataset., in: ISMIR, 2012, pp. 469–474.
[26] A.H.R. Ko, R. Sabourin, u.S. Britto Jr., From dynamic classifier selection to dy- [59] T.K. Ho, The random subspace method for constructing decision forests, IEEE
namic ensemble selection, Pattern Recognit. 41 (2008) 1735–1748. Trans. Pattern Anal. Mach. Intell. 20 (1998) 832–844.
[27] R.M.O. Cruz, R. Sabourin, G.D.C. Cavalcanti, T.I. Ren, META-DES: a dynamic [60] M. Skurichina, R.P.W. Duin, Bagging, boosting and the random subspace
ensemble selection framework using meta-learning, Pattern Recognit. 48 (5) method for linear classifiers, Pattern Anal. Appl. 5 (2) (2002) 121–135.
(2015) 1925–1935. [61] R.P. Duin, D.M. Tax, Experiments with classifier combining rules, in: Interna-
[28] X. Zhu, X. Wu, Y. Yang, Dynamic classifier selection for effective mining from tional Workshop on Multiple Classifier Systems, 20 0 0, pp. 16–29.
noisy data streams, in: Proceedings of the 4th IEEE International Conference [62] W. Wang, P. Jones, D. Partridge, Diversity between neural networks and deci-
on Data Mining, 2004, pp. 305–312. sion trees for building multiple classifier systems, in: International Workshop
[29] L.I. Kuncheva, Clustering-and-selection model for classifier combination, in: on Multiple Classifier Systems, 20 0 0, pp. 240–249.
Fourth International Conference on Knowledge-Based Intelligent Information [63] L. Breiman, Random forests, Mach. Learn. 45 (1) (2001) 5–32.
Engineering Systems & Allied Technologies, 20 0 0, pp. 185–188. [64] L. Rokach, Decision forest: twenty years of research, Inf. Fus. 27 (2016)
[30] R.G.F. Soares, A. Santana, A.M.P. Canuto, M.C.P. de Souto, Using accuracy and 111–125.
diversity to select classifiers to build ensembles, in: Proceedings of the Inter- [65] J.J. Rodríguez, L.I. Kuncheva, C.J. Alonso, Rotation forest: a new classifier
national Joint Conference on Neural Networks, 2006, pp. 1310–1316. ensemble method, IEEE Trans. Pattern Anal. Mach. Intell. 28 (10) (2006)
[31] K. Woods, W.P. Kegelmeyer Jr., K. Bowyer, Combination of multiple classi- 1619–1630.
fiers using local accuracy estimates, IEEE Trans. Pattern Anal. Mach. Intell. 19 [66] G. Giacinto, F. Roli, Design of effective neural network ensembles for image
(1997) 405–410. classification purposes, Image Vis. Comput. 19 (9–10) (2001) 699–707.
[32] P.C. Smits, Multiple classifier systems for supervised remote sensing image [67] G. Giacinto, F. Roli, Design of effective neural network ensembles for image
classification based on dynamic classifier selection, IEEE Trans. Geosci. Re- classification purposes, Image Vis. Comput. 19 (9–10) (2001) 699–707.
mote Sens. 40 (4) (2002) 801–813. [68] M. Aksela, Comparison of classifier selection methods for improving commit-
[33] M. Sabourin, A. Mitiche, D. Thomas, G. Nagy, Classifier combination for hand- tee performance, in: International Workshop on Multiple Classifier Systems,
printed digit recognition, International Conference on Document Analysis and 2003, pp. 84–93.
Recognition (1993) 163–166. [69] R.M.O. Cruz, G.D.C. Cavalcanti, I.R. Tsang, R. Sabourin, Feature representation
[34] T. Woloszynski, M. Kurzynski, P. Podsiadlo, G.W. Stachowiak, A measure of selection based on classifier projection space and oracle analysis, Expert Syst.
competence based on random classification for dynamic ensemble selection, Appl. 40 (9) (2013) 3813–3827.
Inf. Fus. 13 (3) (2012) 207–213. [70] G. Brown, L.I. Kuncheva, “Good” and “bad” diversity in majority vote en-
[35] B. Krawczyk, M. Wozniak, Dynamic classifier selection for one-class classifi- sembles, in: International Workshop on Multiple Classifier Systems, Springer,
cation, Knowl. Based Syst. 107 (2016) 43–53. 2010, pp. 124–133.
[36] P.R.L. de Almeida, L.S. Oliveira, A. de Souza Britto Jr, R. Sabourin, Handling [71] E.M. dos Santos, R. Sabourin, P. Maupin, Overfitting cautious selection of clas-
concept drifts using dynamic selection of classifiers, in: Tools with Artificial sifier ensembles with genetic algorithms, Inf. Fus. 10 (2) (2009) 150–162.
Intelligence (ICTAI), 2016, pp. 989–995. [72] I. Partalas, G. Tsoumakas, I. Vlahavas, Focused ensemble selection: a diversi-
R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216 215
ty-based method for greedy ensemble selection, in: Proceeding of the 18th [106] P.R. Cavalin, R. Sabourin, C.Y. Suen, Dynamic selection approaches for multiple
European Conference on Artificial Intelligence, 2008, pp. 117–121. classifier systems, Neural Comput. Appl. 22 (3–4) (2013) 673–688.
[73] E.M. dos Santos, R. Sabourin, Classifier ensembles optimization guided by [107] R.M.O. Cruz, R. Sabourin, G.D. Cavalcanti, Prototype selection for dynamic
population oracle, in: IEEE Congress on Evolutionary Computation, 2011, classifier and ensemble selection, Neural Comput. Appl. (2016) 1–11.
pp. 693–698. [108] R.M.O. Cruz, R. Sabourin, G.D.C. Cavalcanti, A DEEP analysis of the META-DES
[74] B. Gabrys, D. Ruta, Genetic algorithms in classifier fusion, Appl. Soft Comput. framework for dynamic selection of ensemble of classifiers, CoRR (2015).
6 (4) (2006) 337–347. abs/1509.00825.
[75] Z.-H. Zhou, J. Wu, W. Tang, Ensembling neural networks: many could be bet- [109] D.V. Oliveira, G.D. Cavalcanti, R. Sabourin, Online pruning of base classi-
ter than all, Artif. Intell. 137 (1–2) (2002) 239–263. fiers for dynamic ensemble selection, Pattern Recognit. (2017), doi:10.1016/
[76] R.E. Banfield, L.O. Hall, K.W. Bowyer, W.P. Kegelmeyer, Ensemble diversity j.patcog.2017.06.030.
measures and their application to thinning, Inf. Fus. 6 (1) (2005) 49–62. [110] T.P.F. de Lima, A.T. Sergio, T.B. Ludermir, Improving classifiers and regions of
[77] L. Kuncheva, Fuzzy Classifier Design, vol. 49, Springer Science & Business Me- competence in dynamic ensemble selection, in: Intelligent Systems (BRACIS),
dia, 20 0 0. 2014 Brazilian Conference on, IEEE, 2014, pp. 13–18.
[78] R.P. Duin, D.M. Tax, Classifier conditional posterior probabilities, in: Joint [111] T.P.F. De Lima, T.B. Ludermir, Optimizing dynamic ensemble selection proce-
IAPR International Workshops on Statistical Techniques in Pattern Recogni- dure by evolutionary extreme learning machines and a noise reduction filter,
tion (SPR) and Structural and Syntactic Pattern Recognition (SSPR), Springer, in: Tools with Artificial Intelligence (ICTAI), 2013 IEEE 25th International Con-
1998, pp. 611–619. ference on, IEEE, 2013, pp. 546–552.
[79] J. Kittler, M. Hatef, R.P.W. Duin, J. Matas, On combining classifiers, IEEE Trans. [112] G. Giacinto, F. Roli, Dynamic classifier selection based on multiple classifier
Pattern Anal. Mach. Intell. 20 (1998) 226–239. behaviour, Pattern Recognit. 34 (2001) 1879–1881.
[80] T.K. Ho, J.J. Hull, S.N. Srihari, Decision combination in multiple classifier sys- [113] P.R. Cavalin, R. Sabourin, C.Y. Suen, Logid: an adaptive framework combining
tems, IEEE Trans. Pattern Anal. Mach. Intell. 16 (1) (1994) 66–75. local and global incremental learning for dynamic selection of ensembles of
[81] Y.S. Huang, C.Y. Suen, A method of combining multiple experts for the HMMs, Pattern Recognit. 45 (9) (2012) 3544–3556.
recognition of unconstrained handwritten numerals, IEEE Trans. Pattern Anal. [114] L. Rastrigin, R. Erenstein, Method of Collective Recognition, vol. 595, 1981. (in
Mach. Intell. 17 (1995) 90–94. Russian).
[82] L.I. Kuncheva, J.C. Bezdek, R.P.W. Duin, Decision templates for multiple [115] C. Lin, W. Chen, C. Qiu, Y. Wu, S. Krishnan, Q. Zou, Libd3c: ensemble classifiers
classifier fusion: an experimental comparison, Pattern Recognit. 34 (2001) with a clustering and dynamic selection strategy, Neurocomputing 123 (2014)
299–314. 424–435.
[83] Y. Lu, Knowledge integration in a multiple classifier system, Appl. Intell. 6 (2) [116] M.C. de Souto, R.G. Soares, A. Santana, A.M. Canuto, Empirical comparison
(1996) 75–86. of dynamic classifier selection methods based on diversity and accuracy
[84] G.L. Rogova, Combining the results of several neural network classifiers, Neu- for building ensembles, in: Neural Networks, 2008. IJCNN 2008.(IEEE World
ral Netw. 7 (5) (1994) 777–781. Congress on Computational Intelligence). IEEE International Joint Conference
[85] D.M.J. Tax, M. van Breukelen, R.P.W. Duin, J. Kittler, Combining multiple on, IEEE, 2008, pp. 1480–1487.
classifiers by averaging or by multiplying? Pattern Recognit. 33 (9) (20 0 0) [117] J. Wang, P. Neskovic, L.N. Cooper, Improving nearest neighbor rule with a sim-
1475–1485. ple adaptive distance measure, Pattern Recognit. Lett. 28 (2007) 207–213.
[86] A.K. Jain, R.P.W. Duin, J. Mao, Statistical pattern recognition: a review, IEEE [118] J. Wang, P. Neskovic, L.N. Cooper, Neighborhood size selection in the k-n-
Trans. Pattern Anal. Mach. Intell. 22 (1) (20 0 0) 4–37. earest-neighbor rule using statistical confidence, Pattern Recognit. 39 (2006)
[87] L. Lam, C.Y. Suen, Optimal combinations of pattern classifiers, Pattern Recog- 417–423.
nit. Lett. 16 (9) (1995) 945–954. [119] B. Sierra, E. Lazkano, I. Irigoien, E. Jauregi, I. Mendialdua, K nearest neighbor
[88] D.H. Wolpert, Stacked generalization, Neural Netw. 5 (1992) 241–259. equality: giving equal chance to all existing classes, Inf. Sci. 181 (23) (2011)
[89] Š. Raudys, Trainable fusion rules. ii. small sample-size effects, Neural Netw. 5158–5168.
19 (10) (2006) 1517–1527. [120] T. Woloszynski, M. Kurzynski, On a new measure of classifier competence ap-
[90] Š. Raudys, Trainable fusion rules. i. large sample size case, Neural Netw. 19 plied to the design of multiclassifier systems, in: International Conference on
(10) (2006) 1506–1516. Image Analysis and Processing (ICIAP), 20 09, pp. 995–10 04.
[91] R.A. Jacobs, M.I. Jordan, S.J. Nowlan, G.E. Hinton, Adaptive mixtures of local [121] L. Batista, E. Granger, R. Sabourin, Dynamic ensemble selection for off-line
experts, Neural Comput 3 (1991) 79–87. signature verification, in: International Workshop on Multiple Classifier Sys-
[92] S. Masoudnia, R. Ebrahimpour, Mixture of experts: a literature survey, Artif. tems, 2011, pp. 157–166.
Intell. Rev. (2014) 1–19. [122] K. M., W. T., R. Lysiak, On two measures of classifier competence for dy-
[93] S.E. Yuksel, J.N. Wilson, P.D. Gader, Twenty years of mixture of experts, IEEE namic ensemble selection - experimental comparative analysis, in: Interna-
Trans. Neural Netw. Learn. Syst. 23 (8) (2012) 1177–1193. tional Symposium on Communications and Information Technologies, 2010,
[94] H. Cevikalp, R. Polikar, Local classifier weighting by quadratic programming, pp. 1108–1113.
IEEE Trans. Neural Netw. 19 (10) (2008) 1832–1838. [123] A.L. Brun, A.S.B. Jr., L.S. Oliveira, F. Enembreck, R. Sabourin, Contribution of
[95] D. Jiménez, Dynamically weighted ensemble neural networks for classifica- data complexity features on dynamic classifier selection, in: International
tion, in: Neural Networks Proceedings, 1998. IEEE World Congress on Com- Joint Conference on Neural Networks (IJCNN), 2016, pp. 4396–4403.
putational Intelligence. The 1998 IEEE International Joint Conference on, 1, [124] R.M.O. Cruz, R. Sabourin, G.D.C. Cavalcanti, On meta-learning for dynamic en-
IEEE, 1998, pp. 753–756. semble selection, in: 22nd International Conference on Pattern Recognition
[96] D. Štefka, M. Holeňa, Dynamic classifier aggregation using interaction-sensi- (ICPR), 2014, pp. 1230–1235.
tive fuzzy measures, Fuzzy Sets Syst. 270 (2015) 25–52. [125] F. Pinto, C. Soares, J. Mendes-Moreira, Chade: metalearning with classifier
[97] R.M.O. Cruz, R. Sabourin, G.D.C. Cavalcanti, META-DES.H: a dynamic ensemble chains for dynamic combination of classifiers, in: Joint European Conference
selection technique using meta-learning and a dynamic weighting approach, on Machine Learning and Knowledge Discovery in Databases, Springer, 2016,
in: International Joint Conference on Neural Networks, 2015, pp. 1–8. pp. 410–425.
[98] L.M. Vriesmann, A.S. Britto, L.S. Oliveira, A.L. Koerich, R. Sabourin, Combining [126] J. Xiao, L. Xie, C. He, X. Jiang, Dynamic classifier ensemble model for customer
overall and local class accuracies in an oracle-based method for dynamic en- classification with imbalanced class distribution, Expert Syst. Appl. 39 (2012)
semble selection, in: Neural Networks (IJCNN), 2015 International Joint Con- 3668–3675.
ference on, IEEE, 2015, pp. 1–7. [127] E.M. Dos Santos, R. Sabourin, P. Maupin, A dynamic overproduce-and-choose
[99] R.M.O. Cruz, R. Sabourin, G.D.C. Cavalcanti, Meta-des. Oracle: meta-learning strategy for the selection of classifier ensembles, Pattern Recognit. 41 (2008)
and feature selection for dynamic ensemble selection, Inf. Fus. 38 (2017) 2993–3009.
84–103. [128] M. Wozniak, Hybrid Classifiers: Methods of Data, Knowledge, and Classifier
[100] M. Wozniak, M. Zmyslony, Designing fusers on the basis of discrimi- Combination, Springer, 2013.
nants–evolutionary and neural methods of training, in: International Confer- [129] G. Giacinto, F. Roli, Methods for dynamic classifier selection, in: Image Anal-
ence on Hybrid Artificial Intelligence Systems, Springer, 2010, pp. 590–597. ysis and Processing, 1999. Proceedings. International Conference on, IEEE,
[101] L. Didaci, G. Giacinto, F. Roli, G.L. Marcialis, A study on the performances 1999, pp. 659–664.
of dynamic classifier selection based on local accuracy estimation, Pattern [130] R.M.O. Cruz, Dynamic Selection of Ensemble of Classifiers Using Meta-Learn-
Recognit. 38 (11) (2005) 2188–2191. ing, Ph.D. thesis, École de Technologie Supérieure, 2016.
[102] R.M. O. Cruz, G.D. C. Cavalcanti, T.I. Ren, A method for dynamic ensemble [131] C.A. Shipp, L.I. Kuncheva, Relationships between combination methods and
selection based on a filter and an adaptive distance to improve the quality of measures of diversity in combining classifiers, Inf. Fus. 3 (2002) 135–148.
the regions of competence, Proceedings of the International Joint Conference [132] T.K. Ho, M. Basu, Complexity measures of supervised classification problems,
on Neural Networks (2011) 1126–1133. IEEE Trans. Pattern Anal. Mach. Intell. 24 (3) (2002) 289–300.
[103] L. Didaci, G. Giacinto, Dynamic classifier selection by adaptive k-near- [133] M. Galar, A. Fernández, E. Barrenechea, H. Bustince, F. Herrera, An overview
est-neighbourhood rule, in: International Workshop on Multiple Classifier of ensemble methods for binary classifiers in multi-class problems: experi-
Systems, Springer, 2004, pp. 174–183. mental study on one-vs-one and one-vs-all schemes, Pattern Recognit. 44 (8)
[104] R.M.O. Cruz, R. Sabourin, G.D.C. Cavalcanti, Analyzing different prototype se- (2011) 1761–1776.
lection techniques for dynamic classifier and ensemble selection, in: Interna- [134] M. Galar, A. Fernández, E. Barrenechea, F. Herrera, Drcw-ovo: distance-based
tional Joint Conference on Neural Networks (IJCNN), 2017, pp. 3959–3966. relative competence weighting combination for one-vs-one strategy in multi-
[105] T. Woloszynski, M. Kurzynski, A measure of competence based on randomized -class problems, Pattern Recognit. 48 (1) (2015) 28–42.
reference classifier for dynamic ensemble selection, in: International Confer- [135] D.M.J. Tax, One-class classification: Concept Learning in the Absence of Coun-
ence on Pattern Recognition (ICPR), 2010, pp. 4194–4197. ter-Examples, Ph.D. thesis, Technische Universiteit Delft, 2001.
216 R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216
[136] B. Antosik, M. Kurzynski, New measures of classifier competence – heuristics dures for redundant systems of hypotheses, in: Multiple Hypothesenprü-
and application to the design of multiple classifier systems., in: Computer fung/Multiple Hypotheses Testing, Springer, 1988, pp. 100–115.
Recognition Systems, vol. 4, 2011, pp. 197–206. [159] S. Garcia, F. Herrera, An extension on“statistical comparisons of classifiers
[137] J. Martins, L.S. Oliveira, A. Britto, R. Sabourin, Forest species recognition based over multiple data sets”for all pairwise comparisons, J. Mach. Learn. Res. 9
on dynamic classifier selection and dissimilarity feature vector representa- (Dec) (2008) 2677–2694.
tion, Mach Vis Appl 26 (2–3) (2015) 279–293. [160] J. Derrac, S. García, D. Molina, F. Herrera, A practical tutorial on the use of
[138] A.T. Sergio, T.P. de Lima, T.B. Ludermir, Dynamic selection of forecast combin- nonparametric statistical tests as a methodology for comparing evolutionary
ers, Neurocomputing 218 (2016) 37–50. and swarm intelligence algorithms, Swarm Evol. Comput. 1 (1) (2011) 3–18.
[139] M. Kurzynski, A. Wolczowski, Dynamic selection of classifiers ensemble ap- [161] J. Zhu, H. Zou, S. Rosset, T. Hastie, Multi-class adaboost, Stat. Interface 2 (3)
plied to the recognition of emg signal for the control of bioprosthetic hand, (2009) 349–360.
in: Control, Automation and Systems (ICCAS), 2011 11th International Confer- [162] S. Bernard, L. Heutte, S. Adam, Influence of hyperparameters on random forest
ence on, IEEE, 2011, pp. 382–386. accuracy, in: International Workshop on Multiple Classifier Systems, Springer,
[140] M. Kurzynski, M. Krysmann, P. Trajdos, A. Wolczowski, Multiclassifier system 2009, pp. 171–180.
with hybrid learning applied to the control of bioprosthetic hand, Comput. [163] S. Bernard, L. Heutte, S. Adam, Forest-rk: a new random forest induction
Biol. Med. 69 (2016) 286–297. method, in: International Conference on Intelligent Computing, Springer,
[141] M. Kurzynski, M. Krysmann, P. Trajdos, A. Wolczowski, Two-stage multiclas- 2008, pp. 430–437.
sifier system with correction of competence of base classifiers applied to the [164] M.T. Hagan, M.B. Menhaj, Training feedforward networks with the marquardt
control of bioprosthetic hand, in: Tools with Artificial Intelligence (ICTAI), algorithm, IEEE Trans. Neural Netw. 5 (6) (1994) 989–993.
2014 IEEE 26th International Conference on, IEEE, 2014, pp. 620–626. [165] A. Roy, R.M.O. Cruz, R. Sabourin, G.D. Cavalcanti, Meta-regression based pool
[142] D.J. Hand, W.E. Henley, Statistical classification methods in consumer credit size prediction scheme for dynamic selection of classifiers, in: 23rd Interna-
scoring: a review, J. R. Statist. Soc. 160 (3) (1997) 523–541. tional Conference on Pattern Recognition (ICPR), 2016, pp. 216–221.
[143] C.N. Silla Jr, A.L. Koerich, C.A. Kaestner, The latin music database., in: ISMIR, [166] M.A. Souza, G.D. Cavalcanti, R.M.O. Cruz, R. Sabourin, On the characterization
2008, pp. 451–456. of the oracle for dynamic classifier selection, in: International Joint Confer-
[144] J. Milgram, M. Cheriet, R. Sabourin, Estimating accurate multi-class proba- ence on Neural Networks (IJCNN), 2017, pp. 332–339.
bilities with support vector machines, in: Neural Networks, 2005. IJCNN’05. [167] E.M. dos Santos, R. Sabourin, P. Maupin, A dynamic overproduce-and-choose
Proceedings. 2005 IEEE International Joint Conference on, vol. 3, IEEE, 2005, strategy for the selection of classifier ensembles, Pattern Recognit. 41 (10)
pp. 1906–1911. (20 08) 2993–30 09.
[145] L.S. Oliveira, R. Sabourin, F. Bortolozzi, C.Y. Suen, Automatic recognition of [168] A. Roy, R.M.O. Cruz, R. Sabourin, G.D.C. Cavalcanti, Meta-learning recom-
handwritten numerical strings: a recognition and verification strategy, IEEE mendation of default size of classifier pool for META-DES, vol. 216, 2016,
Trans. Pattern Anal. Mach. Intell. 24 (11) (2002) 1438–1454. pp. 351–362.
[146] F. Vargas, M. Ferrer, C. Travieso, J. Alonso, Off-line handwritten signature [169] S. Garcia, J. Derrac, J. Cano, F. Herrera, Prototype selection for nearest neigh-
gpds-960 corpus, in: Document Analysis and Recognition, 2007. ICDAR 2007. bor classification: taxonomy and empirical study, IEEE Trans. Pattern Anal.
Ninth International Conference on, vol. 2, IEEE, 2007, pp. 764–768. Mach. Intell. 34 (3) (2012) 417–435.
[147] A. Wolczowski, M. Kurzynski, Human-machine interface in bioprosthesis con- [170] I. Triguero, J. Derrac, S. García, F. Herrera, A taxonomy and experimental study
trol using EMG signal classification, Expert Syst. 27 (1) (2010) 53–70. on prototype generation for nearest neighbor classification, IEEE Trans. Syst.
[148] K. Bache, M. Lichman, UCI Machine Learning Repository, 2013. Man Cybern. Part C 42 (1) (2012) 86–100.
[149] R.D. King, C. Feng, A. Sutherland, Statlog: Comparison of Classification Algo- [171] R. Lysiak, M. Kurzynski, T. Woloszynski, Probabilistic approach to the dynamic
rithms on Large Real-World Problems, 1995. ensemble selection using measures of competence and diversity of base clas-
[150] J. Alcalá-Fdez, A. Fernández, J. Luengo, J. Derrac, S. García, KEEL data-mining sifiers, in: International Conference on Hybrid Artificial Intelligence Systems,
software tool: data set repository, integration of algorithms and experimental 2011, pp. 229–236.
analysis framework, Mult. Val. Log. Soft Comput. 17 (2–3) (2011) 255–287. [172] T.K. Ho, Complexity of classification problems and comparative advantages of
[151] L. Kuncheva, Ludmila kuncheva collection LKC, 2004. combined classifiers, in: International Workshop on Multiple Classifier Sys-
[152] R.P.W. Duin, P. Juszczak, D. de Ridder, P. Paclik, E. Pekalska, D.M. Tax, Prtools, tems, Springer, 20 0 0, pp. 97–106.
a matlab toolbox for pattern recognition, 2004. [173] B. Krawczyk, M. Woźniak, G. Schaefer, Cost-sensitive decision tree ensembles
[153] S. Mirjalili, A. Lewis, S-shaped versus v-shaped transfer functions for binary for effective imbalanced classification, Appl. Soft Comput. 14 (2014) 554–562.
particle swarm optimization, Swarm Evol. Comput. 9 (2013) 1–14. [174] Y. Sun, M.S. Kamel, A.K. Wong, Y. Wang, Cost-sensitive boosting for classifica-
[154] M. Friedman, The use of ranks to avoid the assumption of normality implicit tion of imbalanced data, Pattern Recognit. 40 (12) (2007) 3358–3378.
in the analysis of variance, J. Am. Stat. Assoc. 32 (200) (1937) 675–701. [175] S. Bernard, C. Chatelain, S. Adam, R. Sabourin, The multiclass roc front method
[155] J. Demšar, Statistical comparisons of classifiers over multiple data sets, J. for cost-sensitive classification, Pattern Recognit. 52 (2016) 46–60.
Mach. Learn. Res. 7 (2006) 1–30. [176] C. Dubos, S. Bernard, S. Adam, R. Sabourin, Roc-based cost-sensitive classifica-
[156] Ş.Y. Sağlam, W.N. Street, Distant diversity in dynamic class prediction, Ann. tion with a reject option, in: International Conference on Pattern Recognition
Oper. Res. (2016) 1–15. (ICPR), 2016, pp. 3320–3325.
[157] A. Benavoli, G. Corani, F. Mangili, Should we really use post-hoc tests based [177] J. Díez-Pastor, J.J. Rodríguez, C. García-Osorio, L.I. Kuncheva, Diversity tech-
on mean-ranks, J. Mach. Learn. Res. 17 (5) (2016) 1–10. niques improve the performance of the best imbalance learning ensembles,
[158] B. Bergmann, G. Hommel, Improvements of general multiple test proce- Inf. Sci. 325 (2015) 98–117.