0% found this document useful (0 votes)

107 views

Dynamic Classifier Selection

This document provides a review and taxonomy of dynamic classifier selection techniques from multiple classifier systems. It discusses three key aspects of dynamic selection: (1) whether a single classifier or ensemble is selected, (2) the method used to define the local region to estimate classifier competence, and (3) the criteria to estimate competence level. The paper categorizes state-of-the-art techniques based on the taxonomy and compares 18 techniques' performance experimentally. It also discusses uses of dynamic selection in contexts like one-class classification and concept drift, and presents open questions to guide future research.

Uploaded by

Yusuf Yeşilyurt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

107 views

Dynamic Classifier Selection

Uploaded by

Yusuf Yeşilyurt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Information Fusion 41 (2018) 195–216

Contents lists available at ScienceDirect

Information Fusion
journal homepage: www.elsevier.com/locate/inffus

Dynamic classiﬁer selection: Recent advances and perspectives

Rafael M.O. Cruz a,∗, Robert Sabourin a, George D.C. Cavalcanti b
a
LIVIA, École de Technologie Supérieure, University of Quebec, Montreal, Québec, Canada
b
Centro de Informática, Universidade Federal de Pernambuco, Recife, PE, Brazil

a r t i c l e i n f o a b s t r a c t

Article history: Multiple Classifier Systems (MCS) have been widely studied as an alternative for increasing accuracy in
Received 5 July 2017 pattern recognition. One of the most promising MCS approaches is Dynamic Selection (DS), in which the
Revised 1 September 2017
base classifiers are selected on the fly, according to each new sample to be classified. This paper provides
Accepted 10 September 2017
a review of the DS techniques proposed in the literature from a theoretical and empirical point of view.
Available online 11 September 2017
We propose an updated taxonomy based on the main characteristics found in a dynamic selection system:
Keywords: (1) The methodology used to define a local region for the estimation of the local competence of the base
Multiple classifier systems classifiers; (2) The source of information used to estimate the level of competence of the base classifiers,
Ensemble of classifiers such as local accuracy, oracle, ranking and probabilistic models, and (3) The selection approach, which
Dynamic classifier selection determines whether a single or an ensemble of classifiers is selected. We categorize the main dynamic
Dynamic ensemble selection selection techniques in the DS literature based on the proposed taxonomy. We also conduct an extensive
Classifier competence
experimental analysis, considering a total of 18 state-of-the-art dynamic selection techniques, as well as
Survey
static ensemble combination and single classification models. To date, this is the first analysis comparing
all the key DS techniques under the same experimental protocol. Furthermore, we also present several
perspectives and open research questions that can be used as a guide for future works in this domain.
© 2017 Elsevier B.V. All rights reserved.

1. Introduction classiﬁed. DS has become an active research topic in the multiple

classifier systems literature in past years. This has been due to the
Multiple Classifier System (MCS) is a very active area of re- fact that more and more works are reporting the superior perfor-
search in machine learning and pattern recognition. In recent mance of such techniques over traditional combination approaches,
years, several studies have been published demonstrating its ad- such as majority voting and Boosting [24–27]. DS techniques work
vantages over individual classifier models based on theoretical [1– by estimating the competence level of each classifier from a pool
3] and empirical [4–6] evaluations. They are widely used to solve of classifiers. Only the most competent, or an ensemble containing
many real-world problems, such as face recognition [7], music the most competent classifiers is selected to predict the label of a
genre classification [8], credit scoring [9,10], class imbalance [11], specific test sample. The rationale for such techniques is that not
recommender system [12,13], software bug prediction [14,15], in- every classifier in the pool is an expert in classifying all unknown
trusion detection [16,17], and for dealing with changing environ- samples; rather, each base classifier is an expert in a different local
ments [18–20]. region of the feature space [28].
Several approaches are currently used to construct an MCS, and In dynamic selection, the key is how to select the most com-
they have been presented in many excellent reviews covering dif- petent classifiers for any given query sample. Usually, the compe-
ferent aspects of MCS [3,6,21–23]. One of the most promising MCS tence of the classifiers is estimated based on a local region of the
approaches is Dynamic Selection (DS), in which the base classi- feature space where the query sample is located. This region can
fiers1 are selected on the fly, according to each new sample to be be defined by different methods, such as applying the K-Nearest
Neighbors technique, to find the neighborhood of this query sam-
ple, or by using clustering techniques [29,30]. Then, the compe-
∗
Corresponding author. tence level of the base classifiers is estimated, considering only the
E-mail addresses: [email protected], [email protected] (R.M.O. Cruz), samples belonging to this local region according to any selection
[email protected] (R. Sabourin), [email protected] (G.D.C. Cavalcanti).
criteria; these include the accuracy of the base classifiers in this lo-
URL: http://www.livia.etsmtl.ca (R.M.O. Cruz), http://www.cin.ufpe.br/˜viisar
cal region [30–32] or ranking [33] and probabilistic models [25,34].
(G.D.C. Cavalcanti)
1
The term base classifier refers to a single classifier belonging to an ensemble or At this point, the classifier(s) that attained a certain competence
a pool of classifiers. level is (are) selected.

http://dx.doi.org/10.1016/j.inffus.2017.09.010
1566-2535/© 2017 Elsevier B.V. All rights reserved.
196 R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216

In this paper, we present an updated taxonomy of dynamic 2. Basic concepts

classifier and ensemble selection techniques, taking into account
the following three aspects: (1) The selection approach, which con- This section presents the main concepts comprised in DS ap-
siders, whether a single classifier is selected (this is known as Dy- proaches. They provide the background needed to understand how
namic Classifier Selection (DCS)) or an ensemble is selected (this DS techniques work as well as the main challenges involved in this
for its part is known as Dynamic Ensemble Selection (DES)); (2) class of techniques. The following mathematical notation is used in
The method used to define the local region in which the local com- this paper:
petences of the base classifiers are estimated, and (3) The selection
• C = {c1 , . . . , cM } is the pool consisting of M base classifiers.
criteria used to estimate the competence level of the classifier. We
• xj is a test sample with an unknown class label.
review and categorize the state-of-the-art dynamic classifier and
ensemble selection techniques based on the proposed taxonomy.
• θ j = {x1 , . . . , xK } is the region of competence of xj , and xk is
one instance belonging to θ j .
We also discuss the increasing use of dynamic selection tech-
• P(ωl | xj , ci ) Posterior probability obtained by the classifier ci
niques, considering different classification contexts, such as One-
for the instance xj .
Class Classification (OCC) [35], concept drift [36–38], One-Versus-
One (OVO) decomposition problems [39–41], as well as the ap-
• Wk = d1 , and dk is the Euclidean distance between the query xj
k
plication of DS techniques to solve complex real-world problems and its neighbour sample xk .
such as signature verification [42], face recognition [7,43–45], mu- • δ i, j is the estimated competence of the base classifier ci for the
sic classification [8] and credit scoring [10]. In particular, we de- classification of xj .
scribe how the properties of dynamic selection techniques can be • = {ω1 , . . . , ωL } is the set of L class labels in the classification
used to handle the intrinsic properties of each problem. problem.
An experimental analysis is conducted comparing the per- • ωl is the class predicted by ci .
formance of 18 state-of-the-art dynamic classifier and ensem- • S(xk ) = {S1 (xk ), . . . , SL (xk )} is the vector of class supports ob-
ble selection techniques over multiple classification datasets. The tained by the classifier ci for a given sample xk ∈ θ j .
DS techniques are also compared against the baseline methods, • x˜ j is the output profile of an instance xj .
namely, (1) Static Selection (SS), i.e., the selection of an ensem- • φ j is the set containing the most similar output profiles of the
ble of classifiers during the training stage of the system [46]; (2) query sample, x˜ j , computed in the decision space.
Single Best (SB), which corresponds to the performance of the best
A multiple classifier system is essentially composed of three
classifier in the pool according to the validation data, and majority
stages [24]: (1) Generation, (2) Selection, and (3) Aggregation or
voting (MV), which corresponds to the majority voting combina-
Fusion (Fig. 1). In the generation stage, a pool of classifiers is
tion of all classifiers in the pool without any pre-selection of clas-
trained to create a set of classifiers that are both accurate and
sifiers. To allow a fair comparison of the techniques, all the DS and
diverse. In the selection stage, an ensemble, containing the most
static techniques were evaluated using the same experimental pro-
competent classifiers, is selected. It must be mentioned that the
tocol, i.e., the same division of datasets, as well as the same pool
selection stage is optional, as it is not used by some MCS algo-
of classifiers. The performance of the DS techniques was also com-
rithms. In the last phase, the outputs of the base experts are ag-
pared with those of the best classification models according to [4],
gregated to give the final decision of the system.
including Support Vector Machine (SVM) and Random Forests.
Several works have been proposed for each of the three phases.
The contributions of this paper in relation to other reviews in
In Fig. 2, we present the taxonomy of an MCS, considering the
classifiers ensembles are:
main approaches for classifier generation, selection and integra-
tion. The selection phase is highlighted as it is the main focus of
this review. The taxonomy of the three MCS stages is described in
1. It proposes an updated taxonomy of dynamic selection tech-
the following sections.
niques.
2. It discusses the use of dynamic selection techniques on differ-
2.1. Classifier generation
ent contexts, including One-Versus-One decomposition (OVO),
and One-Class Classification (OCC).
The goal of the classifier generation step is to create a pool,
3. It reviews the use of DS techniques to solve complex real-world
C = {c1 , . . . , cM }, containing M classifiers that are both accurate and
problems such as image classification and biomedical applica-
diverse. The base classifiers should be different, since there is no
tions.
reason to combine experts that always present the same output.
4. It presents an empirical comparison between several state-of-
There are six main strategies to generate a diverse pool of clas-
the-art dynamic selection techniques over several classification
sifiers [3,47]. Similar to [47], the methods are ordered here such
datasets under the same experimental protocol.
that the lower an item appears on the list, the more successful the
5. It discusses the most recent findings in this field, and examines
combination stages will be, as the generated base experts will be
the open questions that can be addressed in future works.
more diverse and informative:

1. Different initializations: If the training process is initialization

This work is organized as follows: Section 2 presents an dependent, different initializations may result in different clas-
overview of multiple classifier system approaches. In Section 3, we sifiers. This holds, for instance, for neural networks, where the
propose an updated dynamic selection taxonomy, and discuss each initial configuration of weights changes the final model [48,49].
component of a DS technique. In Section 4, we describe the most 2. Different parameters: In this case, the base experts are gen-
relevant dynamic selection methods and categorize them accord- erated with different configurations of the hyper-parameters,
ing to the proposed taxonomy. Section 5 presents a review of sev- which leads to different decision boundaries; for example,
eral real-world applications that use dynamic selection techniques the combination of Support Vector Machine (SVM) classifiers
to achieve a higher classification accuracy. An empirical compar- trained with distinct values of cost and kernel scale.
ison between the state-of-the-art DS techniques is conducted in 3. Different architectures: Like training multiple Multi-Layer Per-
Section 6. The conclusion and perspectives for future research in ceptron (MLP) neural networks with different numbers of hid-
dynamic selection are given in the last section. den layers.
R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216 197

Fig. 1. The three possible phases of an MCS system. In the first stage, a pool of classifiers C = {c1 , . . . , cM } (M is the number of classifiers) is generated. In the second phase,

an Ensemble of Classifiers (EoC), C ⊆ C is selected. In the last phase, the decisions of the selected base classifiers are aggregated to give the final decision.

Fig. 2. Taxonomy of an MCS system considering the three main phases. The selection stage is highlighted since it is the focus of this review.

4. Different classifier models: This method involves the combina- search algorithms have been considered for static selection, such as
tion of different classification models (decision tree, K-Nearest greedy search [72,73], evolutionary algorithms [71,74,75] and other
Neighbor (K-NN) and SVM, for example). In this case, diversity heuristic approaches [69,76].
is due to the intrinsic properties of each model, which change In contrast, in dynamic selection, a single classifier or an en-
the way the decision boundaries are generated. Systems that semble is selected specifically to classify each unknown example.
use different classifier models are often called heterogeneous Based on a pool of classifiers C, dynamic selection techniques con-
models. sist in finding a single classifier ci , or an ensemble of classifiers

5. Different training sets: In this case, each base classifier is C ⊆ C, having the most competent classifiers for the classification
trained using a different distribution of the training set. The of a specific query, xj . The rationale for dynamic selection tech-
Bagging [50,51], Boosting [52,53] and clustering-based classi- niques is that each base classifier is an expert in distinct regions
fier generation approaches [10,54] are examples of generation of the feature space. The method aims to select the most compe-
methods that are based on this paradigm. tent classifiers in the local region where xj is located.
6. Different feature sets: This methodology is used in applica-
tions where the data can be represented in distinct feature 2.3. Aggregation
spaces. In face recognition, for example, multiple feature ex-
traction methods can be applied to extract distinct sets of fea- The aggregation phase consists in fusing the outputs obtained
tures based on a face image [43]. The same principle applies by the selected classifiers according to a combination rule. The
to handwriting recognition [55–57] and music genre classifica- combination of the base classifiers can be performed based on
tion [58]. Each base expert can be trained based on a differ- the class labels, such as in the Majority Voting scheme, or by
ent feature extraction method. Furthermore, different feature using the scores obtained by the base classifier for each of the
spaces can be generated based on one feature space through classes in the classification problem. In this approach, the scores
the selection of a subset of features, just as in the random sub- obtained by the base classifiers are interpreted as fuzzy class mem-
space method [59,60]. berships [77] or the posterior probability that a sample xj belongs
to a given class [78].
Empirical studies of different generation strategies can be found There are three main strategies for the aggregation phase: non-
in [61,62]. It should be mentioned that more than one strategy can trainable, trainable and dynamic weighting.
be used together. For example, in the Random Forest [63,64] and
Rotation Forest [65] techniques, each decision tree can be trained 2.3.1. Non-trainable
using different feature sets, different divisions of the dataset, as Several non-trainable rules for combining classifiers have been
well as different configurations of hyper-parameters. proposed [47,79]. Examples of such aggregation methods are the
Sum, Product, Maximum, Minimum, Median and Majority voting
2.2. Selection schemes [79], Borda count [80], Behavior Knowledge space [81],
Decision Templates [82] and Dempster-Shafer combination [83,84].
For the selection stage, it can be conducted either in a static or The effectiveness of different aggregation rules have been analyzed
dynamic fashion. Fig. 3 presents the differences in the selection ap- in several works [6,47,80,85–87]. As reported in [47], the problem

proaches. In static selection methods, the EoC, C , is selected dur- with non-trainable combination rules is that they require certain
ing the training phase, according to a selection criterion estimated assumptions about the base classifiers in order to obtain a good

in the validation dataset. The same ensemble C is used to pre- performance. For instance, the Majority Voting and Product rule
dict the label of all test samples in the generalization phase. The are effective if the base classifiers are independent, while the Sum
most common selection criteria used for selecting static ensem- rule produces good results when the base classifiers have indepen-
bles are diversity [66–70] and classification accuracy [46,71]. Many dent noise behavior.
198 R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216

Fig. 3. Differences between static selection, dynamic classiﬁer selection (DCS) and dynamic ensemble selection (DES). In static selection, the EoC is selected based on the
training or validation data. In the dynamic selection approaches, the selection is based on each test data xj .

2.3.2. Trainable 2.3.3. Dynamic weighting

The rationale for trained combiners is that instead of using Dynamic weighting is essentially, in essence, similar to dynamic
just a fixed combination rule, such as the Majority Voting or the selection methods. They are all based on the local competence of
Sum rule, the combination may be adapted to the specificity of the base classifiers in the region where the query sample xj is lo-
the classification problem. In this strategy, the outputs of the base cated. However, instead of selecting a subset of classifiers, the out-
classifier are used as input features for another learning algo- puts of all classifiers are aggregated to give the final decision such
rithm, which learns the aggregation function based on the training that the most competent classifier receives a higher weight value,
data [88]. and so on.
Several works have demonstrated the fact that trainable com- Examples of dynamic weighting schemes are the local classi-
biners outperform non-trainable ones. For instance, in [55–57] an fier weighting by quadratic programming [94], the dynamic in-
MLP neural network used to combine the outputs of the base tegration of classifiers [37,95], and the fuzzy dynamic classi-
experts trained using distinct feature sets outperformed all non- fier aggregation [96]. Moreover, a hybrid dynamic selection and
trainable combination rules for the handwritten digit and char- weighting scheme is also possible [25,26,97,98]. In this approach,
acter recognition. The effectiveness of trained combination rules the base classifiers that presented a certain competence level
for small and large sample size data was studied by Raudys are first selected. Then, their decisions are weighted based on
in [89] and [90], respectively. An interesting discussion about the their estimated competence levels. Experimental results conducted
benefits of trainable combination schemes is presented by Duin in in [25,97,99] demonstrate that the hybrid dynamic selection and
the paper “The combining classifier: to train or not to train?” [47]. weighting approaches usually present the best classification per-
Another interesting trainable approach is the Mixture of Ex- formances when compared to performing only dynamic weighting.
perts (ME) [91,92]. In this methodology, the base classifiers and
the aggregation function are trained together. The problem with 2.4. The Oracle
this approaches lies in the fact they are only defined for neural
network ensembles [24]. The main foundations and algorithms in An important concept in the MCS literature is the concept of
the mixture of experts community are presented in two recent sur- the Oracle. The Oracle is an abstract model defined in [1], which
veys [92,93]. always selects the classifier that predicted the correct label, for the
given query sample, if such a classifier exists. Although it is possi-
R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216 199

Fig. 4. Taxonomy of dynamic selection systems. The taxonomy of the selection criteria is based on the previous taxonomy proposed by Britto et al. [24].

ble to achieve results higher than the Oracle by working on the sensitive to the distribution of this region [104,107,108]. Indeed,
supports given by the base classifier [21,100], from a dynamic se- many recent papers have pointed out that it is possible to improve
lection point of view, the Oracle is regarded in the literature as a the performance of DS methods just by working on better defining
possible upper limit for the performance of MCS, and as such, it these regions [102–104,109–111].
is widely used to compare the performances of different dynamic Usually, the local regions are defined using the K-NN tech-
selection schemes [101]. The Oracle can thus measure how close a nique [26,31], via clustering methods (e.g., K-Means) [29,30], us-
DCS technique is to the upper limit performance, for a given pool ing the decisions of the base classifiers [106,112,113] or a com-
of classifiers, and indicates whether there is still room for improve- petence map that is defined through the use of a potential func-
ment in terms of classification accuracy. tion [25,114]. In all cases, a set of labeled samples, which can be
It has however been shown that there is a significant perfor- either the training or validation set, is required. This set is called
mance gap between DS schemes and the Oracle [26,27,101]. Didaci the dynamic selection dataset (DSEL) [27].
et al. [101] stated that the Oracle is too optimistic to be consid-
ered as an upper bound for dynamic selection techniques. In fact, 3.1.1. Clustering
the Oracle can correctly classify instances that should not be cor- In techniques that use clustering to define the region of com-
rectly classified based on the Bayesian decision theory [21]. petence [29,30,115,116], the first step is to define the clusters in
DSEL. Next, the competence of each base classifier is estimated for
3. Dynamic selection
all clusters. During the generalization stage, given a new test sam-
ple, xj , the distance between the test sample and the centroid of
In dynamic selection, the classification of a new query sample
each cluster is calculated. The competence of the base classifiers
usually involves three steps:
are then measured based on the samples belonging to the nearest
1. Definition of the region of competence; that is, how to define cluster.
the local region surrounding the query, xj , in which the compe- The advantage of using the clustering technique is that all the
tence level of the base classifiers is estimated. rankings and classifier selections are estimated during the train-
2. Determination of the selection criteria used to estimate the ing phase. For each cluster, the EoC is defined a priori. Hence, DS
competence level of the base classifiers, e.g., Accuracy, Proba- techniques based on clustering are much faster during the gener-
bilistic, and Ranking. alization phase. In addition, only the distance between the query
3. Determination of the selection mechanism that chooses a sin- sample and the centroids of each cluster needs to be estimated,
gle classifier (DCS) or an ensemble of classifiers (DES) based on rather than all instances in DSEL.
their estimated competence level.
Fig. 4 presents the taxonomy of DS, considering these three as- 3.1.2. K-Nearest Neighbors
pects. DS methods can be improved by working on each of these In the case of the K-NN technique, the K-Nearest neighbors of
points. For instance, the approaches proposed in [102–104] aim the query sample, xj , are estimated, using the dynamic selection
to improve DS techniques by obtaining better estimates of the dataset (DSEL). The set with the K-Nearest Neighbors is called the
regions of competence, while several works are based on new region of competence and is denoted by θ j = {x1 , . . . , xK }. Then,
criteria for estimating the competence level of the base classi- the competence of the base classifiers is estimated taking into ac-
fiers [26,27,30,31,34,105,106]. These three aspects are detailed in count only the instances belonging to this region of competence.
the following sections. The advantage of using K-NN over clustering is that K-NN al-
lows a more precise estimation of the local region, which leads to
3.1. Region of competence definition many different configurations of EoC according to the classification
of the new instances [30,116]. However, there is a higher computa-
The definition of a local region is of fundamental importance to tional cost involved when using K-NN rather than clustering, since
DS methods, since the performance of all DS techniques is very the distance between the query and the whole DSEL needs to be
200 R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216

estimated prior to estimating the classifiers’ competence. This is a region defined in the decision space are the Multiple Classifier Be-
problem especially when dealing with large sized datasets [107]. havior (MCB) [112], K-Nearest Output Profiles (KNOP) [106,121] and
Since the definition of the region of competence plays a very META-DES [27,99].
important role in the accuracy of DS techniques, some works have
evaluated different versions of the K-NN algorithm for a better es-
timation of such regions. In [107], the authors considered an adap- 3.2. Selection criteria
tive K-NN proposed in [117,118], which shifts the region of com-
petence from the class border to the class centers. Samples that The criterion used to measure the competence level of the base
are more likely to be noise were less likely to be selected to com- classifiers for the classification of xj is a key component of any dy-
pose the region of competence. In [39], the authors used the K- namic selection technique. Based on [24], the criteria can be orga-
Nearest Neighbors Equality (K-NNE) [119] to estimate the regions nized into two groups (Fig. 4): individual-based and group-based
of competence. Didaci and Giacinto [103] evaluated the impact of measures. The former presents the measures where the individual
an adaptive neighborhood for dynamic classifier selection tech- performance of the base classifier is used to estimate its level of
niques. They evaluated the choice of the better suited distance competence. The competence of each base classifier ci is measured
metric to compute the neighborhood as well as a suitable choice independently of the performance of the other base classifiers in
of neighborhood size. the pool. This category can be divided into several other subgroups
according to the type of information that is used to measure the
3.1.3. Potential function model competence of the base classifiers, namely, Ranking [31,33], Ac-
These methods are inspired by the work of Rastrigin and Eren- curacy [31,102], Probabilistic [25,66,122], Behavior [106,112], Ora-
stein [114], which is one of the first works to provide a method- cle [26], Data complexity [123] and Meta-learning [27,124,125]. It
ology for dynamic selection. They differ from the majority of the must be mentioned that the system based on meta-learning, how-
other techniques in the DS literature in that they use the whole ever, presents a different perspective of how the competence of a
dynamic selection dataset for the computation of competence, base classifier can be “learned” based on different sources of infor-
rather than just only the neighborhood of the test sample. How- mation.
ever, the influence of each data point in xk ∈ DSEL is weighted The group-based measures are composed of metrics that take
by its Euclidean distance to the query xj using a potential func- into account the interaction between the classifiers in the pool.
tion model. Usually, a Gaussian potential function is considered This category can be further divided into three subgroups [24]:
(Eq. (1)). Hence, the points that are closer to the query have a Diversity [30,71], Data Handling [126] and Ambiguity [127]. These
higher influence on the estimation of the classifiers’ competence. measures are not directly related to the notion of competence of a
base classifier, but rather, to the notion of relevance, i.e., whether
K (xk , x j ) = exp(−d (xk , x j )2 ) (1) the base classifier works well in conjunction with other classifiers
in the ensemble. In these techniques, they are based on the per-
Several DS techniques have been proposed using the potential formance of the base classifier in relation to the performance of
function model: Dynamic Ensemble Selection based on Kullback– a pre-selected ensemble of classifiers. For instance, in [30], first,
Leibler divergence (DES-KL) [34], the technique based on the ran- an ensemble with the most accurate classifiers is selected (thus,
domized reference classifier (RRC) [25], the DCS methods based on the local accuracy is the criterion used to select the most compe-
logarithmic and exponential functions [120]. tent classifiers individually). Next, the system checks the base clas-
Using this class of methods to define the regions of competence sifiers that are more diverse, in relation to the pre-selected base
has the advantage of removing the need to set the neighborhood classifiers, in order to add more diversity to the EoC.
size a priori as the potential function K(xk , xj ) is used to reduce
the influence of each data point based on its Euclidean distance to
the query. However, its drawback is the increased computational 3.3. Selection approach
cost involved in computing the competence of the base classifier
since the whole DSEL, and not just the neighborhood of the query Regarding the selection approach, dynamic selection techniques
sample, is used for the competence estimation. can select either a single classifier, dynamic classifier selection
(DCS) or select an ensemble of classifiers (dynamic ensemble se-
3.1.4. Decision space lection (DES)). Early works in dynamic selection started with the
The DS techniques in this category are based on the behav- selection of a single classifier rather than an ensemble of classi-
ior of the pool of classifiers using the classifiers’ predictions as fiers (EoC). In such techniques, only the classifier that attained the
information. They are inspired by the Behavior Knowledge Space highest competence level is used for the classification of the given
(BKS) [81], often called the “decision space” [106,112], since it is test sample. Examples of DCS methods are the A Priori and A Pos-
based on the decisions made by the base classifiers. teriori methods [101], as well as the Multiple-Classifier Behaviour
An important aspect of this class of techniques is the transfor- (MCB) [112].
mation of the test and training sample into output profiles. This However, given the fact that selecting only one classifier can be
transformation can be conducted by using the hard decisions of highly error-prone, some researchers decided to select a subset of
the base classifiers (e.g., the class labels predicted), such as in the the pool of classifiers rather than just a single base classifier. All
BKS method, or by using the estimated posterior probabilities of base classifiers that obtained a certain competence level are used
the base classifiers, as suggested in [106,113,121]
. The output pro- to compose the EoC, and their outputs are aggregated to predict
file of an instance xj is denoted by x˜ j = x˜ j,1 , x˜ j,2 , . . . , x˜ j,M , where the label of xj . Examples of DES techniques are the K-Nearest Ora-
each x˜ j,i is the decision yielded by the base classifier ci for the cles (KNORA) [26], the K-Nearest Output Profiles (KNOP) [106], the
sample xj . method based on the Randomized Reference Classifier (RRC) DES-
Then, the competence region is calculated by the similarity be- RRC [25] and the META-DES framework [27,99]. Another reason for
tween the output profile of the query, x˜ j , and the output profiles selecting an EoC rather than a single classifier model is that, fre-
of the samples in DSEL. The set with the most similar output pro- quently, several base classifiers present the same competence level
files, denoted by φ j , is used to estimate the competence level of locally. In such cases, the question then is why all of them are not
the base classifiers. Examples of techniques that use a competence selected, rather than one being randomly chosen.
R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216 201

Table 1
Categorization of DS methods. They are based on their year of publication. All methods presented in this table are later considered in our comparative study (Section 6).

Technique Region of competence deﬁnition Selection criteria Selection approach Reference Year

Classifier Rank (DCS-Rank) K-NN Ranking DCS Sabourin et al. [33] 1993
Overall Local Accuracy (OLA) K-NN Accuracy DCS Woods et al.[31] 1997
Local class accuracy (LCA) K-NN Accuracy DCS Woods et al.[31] 1997
A Priori K-NN Probabilistic DCS Giacinto[129] 1999
A Posteriori K-NN Probabilistic DCS Giacinto[129] 1999
Multiple Classifier Behavior (MCB) K-NN Behavior DCS Giacinto et al.[112] 2001
Modified Local Accuracy (MLA) K-NN Accuracy DCS P.C. Smits[32] 2002
DES-Clustering Clustering Accuracy & Diversity DES Soares et al.[30,116] 2006
DES-KNN K-NN Accuracy & Diversity DES Soares et al.[30,116] 2006
K-Nearest Oracles Eliminate (KNORA-E) K-NN Oracle DES Ko et al.[26] 2008
K-Nearest Oracles Union (KNORA-U) K-NN Oracle DES Ko et al.[26] 2008
Randomized Reference Classifier (RRC) Potential function Probabilistic DES Woloszynski et al.[25] 2011
Kullback-Leibler (DES-KL) Potential function Probabilistic DES Woloszynski et al.[34] 2012
DES Performance (DES-P) Potential function Probabilistic DES Woloszynski et al.[34] 2012
K-Nearest Output Profiles (KNOP) K-NN Behavior DES Cavalin et al.[106] 2013
META-DES K-NN Meta-Learning DES Cruz et al.[27] 2015
META-DES.Oracle K-NN Meta-Learning DES Cruz et al.[130] 2016
Dynamic Selection On Complexity (DSOC) K-NN Accuracy & Complexity DCS Brun et al.[123] 2016

4. Dynamic selection techniques 4.3. Local classiﬁer accuracy (LCA)

In this section, we present a review of the most relevant dy- The LCA technique [31] is similar to the OLA, with the only dif-
namic selection algorithms. The DS techniques were chosen taking ference being that in the former, the local accuracy is estimated
into account their importance in the literature by the introduction in respect of output classes ωl (ωl is the class assigned for xj by
of new concepts in the area (i.e., methods that introduced differ- ci ) for the whole region of competence (Eq. (3)). The classifier pre-
ent ways of defining the competence region or selection criteria), senting the highest competence level, δ i, j , is selected to predict the
their number of citations, as well as the availability of source code. label of xj .
Minor variations of an existing technique, such as different ver-
P ( ωl | x k , ci )
sions of the KNORA-E technique proposed in [102] and [98], were δi, j = xKk ∈ωl (3)
not considered. Furthermore, we gave more emphasis to the tech- k=1 P ( ωl | x k , ci )
niques proposed in the last four years, since they were published
after the last reviews in MCS [21,22,24,128]. 4.4. A Priori
Table 1 categorizes the key dynamic selection techniques de-
scribed in this review according to our proposed taxonomy. These The A Priori method [129] considers the probability of correct
techniques are used in our experimental evaluation conducted in classification of the base classifier ci , in θ j taking into account
Section 6. They are detailed in the next sections 2 . Moreover, in the supports obtained by the base classifier ci . Hence, the vector
Section 4.19, we present the use of DS techniques in different pat- containing the posterior probabilities for each class is considered
tern recognition contexts (e.g., One-Class Classification and One- instead of only the label assigned to each xk ∈ θ j . Moreover, this
Versus-One decomposition). method also weights the influence of each sample, xk , in the region
of competence according to its Euclidean distance to the query xj .
The closest samples have a higher influence on the computation of
4.1. Modified Classifier Ranking (DCS-Rank) the competence level δ i, j . Eq. (4) demonstrates the calculation of
the competence level δ i, j using the A Priori method:
In the Modified Classifier Ranking method [31,33], the ranking K
P (ωl | xk ∈ ωl , ci )Wk
of a single base classifier ci is simply estimated by the number δi, j = k=1
K (4)
of consecutive correctly classified samples in the region of compe- k=1 Wk
tence θ j . The classifier that correctly classifies the highest number The classifier with the highest value of δ i, j is selected. However,
of consecutive samples is considered to have the highest “rank”, this selected classifier is only used to predict the label of xj if its
and is selected as the most competent classifier for the classifica- competence level is significantly better than that of the other base
tion of xj . classifiers in the pool (i.e., when the difference in competence level
is higher than a predefined threshold). Otherwise, all classifiers in
the pool are combined using the majority voting rule.
4.2. Overall Local Accuracy (OLA)
4.5. A Posteriori
In this method [31], the level of competence, δ i, j , of a base clas-
sifier ci is simply computed as its classification accuracy in the re- The A Posteriori method [129] works similarly to the A Priori.
gion of competence θ j (Eq. (2)). The classifier presenting the high- The only difference is that it takes into account the class predicted
est competence level is selected to predict the label of xj . by the base classifier ci , for the test sample xj during the compe-
tence estimation (Eq. (5)).
1
K
δi, j = P ( ωl | x k ∈ ωl , ci ) (2) P (ωl | xk , ci )Wk
K δi, j = xKk ∈ωl (5)
k=1
k=1 P (ωl | xk , ci )Wk
The classifier with the highest value of δ i, j is selected. As in the
2
Code for all 18 DS the techniques is available upon request. A Priori method, the selected classifier is only used to predict the
202 R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216

label of xj if its competence level is significantly better than that 4.9. DES-KNN
of the other base classifiers in the pool (i.e., when the difference
in competence level is higher than a predefined threshold). Oth- The first step in this technique is to compute the region of com-
erwise, all classifiers in the pool are combined using the majority petence θ j . Then, the base classifiers are ranked in decreasing or-
voting rule. der of accuracy and in increasing order of diversity based on the
samples belonging to θ j . The Double Fault measure [67] was used,
4.6. Multiple Classifier Behavior (MCB) since it presented the highest correlation with ensemble accuracy
in the study conducted by Shipp and Kuncheva [131]. Then, the N
The MCB technique is based on the behavior knowledge space most accurate classifiers and the J most diverse classifiers are se-

(BKS) [81] and the classifier local accuracy. Given a new test sam- lected to compose the EoC, C . The values of J and N, (J ≤ N) must
ple xj , its region of competence, θ j , is estimated. Next, the output be defined prior to applying this method.
profiles of the test sample as well as those of the region of com-
petence are computed using the BKS algorithm.
4.10. KNORA-Eliminate
The similarity between the output profile of the test sample x˜ j
and those from its region of competence, x˜ k ∈ θ j , are calculated
The KNORA-Eliminate technique [26] explores the concept of
(Eq. (6)). Samples with similarities lower than a predefined thresh-
Oracle, which is the upper limit of a DCS technique. Given the re-
old are removed from the region of competence θ j . Hence, the size
gion of competence θ j , only the classifiers that correctly recognize
of the region of competence is variable, since it also depends on
all samples belonging to the region of competence are selected. In
the degree of similarity between the query sample and those in its
other words, all classifiers that achieved a 100% accuracy in this
region of competence. After all the similar samples are selected,
region (i.e., that are local Oracles) are selected to compose the en-
the competence of the base classifier, δ i, j , is simply estimated by
semble C . Then, the decisions of the selected base classifiers are
its classification accuracy in the resulting region of competence.
aggregated using the majority voting rule. If no base classifier is
1 selected, the size of the region of competence is reduced, and the
M
S(x˜ j , x˜ k ) = T ( x j , xk ) (6) search for the competent classifiers is restarted.
M
i=1

1 if ci ( x j ) = ci ( x k ), 4.11. KNORA-Union
T ( x j , xk ) = (7)
0 if ci ( x j ) = ci ( x k ).
The KNORA-Union technique [26] selects all classifiers that are
Similar to the A Priori and A Posteriori techniques (Sections 4.4 able to correctly recognize at least one sample in the region of
and 4.5), the decision is made as follows: If the selected classi- competence. This method also considers that a base classifier can
fier is significantly better than the others in the pool (difference in participate more than once in the voting scheme when it correctly
competence level is higher than a predefined threshold), it is used classifies more than one instance in the region of competence. The
for the classification of xj . Otherwise, all classifiers in the pool are number of votes of a given base classifier ci is equal to the num-
combined using the majority voting rule. ber of samples in the region of competence, θ j , for which it pre-
dicted the correct label. For instance, if a given base classifier ci
4.7. Modified Local Accuracy (MLA) predicts the correct label for three samples belonging to θ j , it gains
three votes for the majority voting scheme. The votes collected by
Proposed by Smits [32], this technique aims to solve the prob- all base classifiers are aggregated to obtain the ensemble decision.
lem of defining the size of the region of competence (i.e., the num-
ber of instances selected to compose the region of competence).
4.12. Randomized reference classifier (DES-RRC)
When the value of K is too high, instances that are not similar to
xj may be included in the region of competence, while a low value
This method is based on the randomized reference classifier,
of K may lead to insufficient information. To tackle this issue, the
in order to decide whether or not the base classifier ci performs
MLA algorithm weights each instance in θ j by its distance to xj
significantly better than the random classifier. The level of com-
(Eq. (8)):
petence of ci is computed based on two parts: a source of com-

K petence Csrc and a Gaussian potential function K(xk , xj ) (Eq. (1)),
δi, j = P (ωl | xk ∈ ωl , ci )Wk (8) which is used to reduce the influence of each data point in DSEL
k=1 based on its Euclidean distance to xj . Thus, the competence level
The classifier presenting the highest competence level, δ i, j , is of a base classifier, ci , for the classification of the query, xj , is esti-
selected to predict the label of xj . mated using Eq. (9).

δi, j = Csrc K (xk , x j ) (9)
4.8. DES-clustering (DES-KMEANS)
xk ∈DSEL

In this method [30], the K-Means algorithm is applied to DSEL The source of competence Csrc is estimated based on the con-
in order to sub-divide this set into several clusters. For each clus- cept of randomized reference classifier (RRC) proposed in [105]3 .
ter produced, the classifiers are ranked in decreasing order of ac- The base classifiers that presented a competence level higher than

curacy and in increasing order of diversity. The Double Fault mea- the random classifier are selected to compose the ensemble C . The
sure [67] is used to measure the diversity of the base classifiers. base classifiers with a level of competence δ i, j higher than the
The N most accurate and the J most diverse classifiers are associ- competence of a random classifier 1L are selected to compose the
ated to each cluster.
ensemble C
Given a new sample xj of unknown class, its Euclidean distance
to the centroid of each cluster is calculated. Then, the set of N
most accurate and J most diverse classifiers associated to the near- 3
The Matlab code for this technique is available at: http://www.mathworks.com/

est cluster is used to compose the ensemble of classifiers C . matlabcentral/fileexchange/28391- a- probabilistic- model- of- classifier- competence.
R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216 203

4.13. Dynamic ensemble selection performance (DES-P) • The meta-features are encoded into a meta-features vector vi, j .
• A meta-classifier λ is trained based on the meta-features vi, j
Proposed by Woloszynski et al. [34], this method works as fol- to predict whether or not ci will achieve the correct prediction
lows: First, the local performance of a base classifier ci is calculated for xj , i.e., if it is competent enough to classify xj .
using the region of competence θ j . The competence of the base
classifier is then calculated by the difference between the accuracy In other words, a meta-classifier, λ, is trained, to predict
of the base classifier ci , in the region of competence θ j (denoted whether a base classifier ci is competent enough to classify, a given
test sample xj . After the pool of classifiers is generated, the frame-
by Pˆ(ci | θ j )), and the performance of the random classifier, that is,
work performs a meta-training stage, in which, the meta-features
the classification model that randomly chooses a class with equal
are extracted from each instance belonging to the training and
probabilities. For a classification problem with L classes, the perfor-
the dynamic selection dataset (DSEL). Then, the extracted meta-
mance of the random classifier is (RC = 1L ). Hence, the competence
features are used to train the meta-classifier λ. Thus, the advan-
level δ i, j in this technique is calculated according to Eq. (10).
tage of using meta-learning is that multiple criteria can be encoded
1 as different sets of meta-features in order to estimate the compe-
δi, j = Pˆ(ci | θ j ) − (10)
tence level of the base classifiers. In addition, the selection rule
L
The base classifiers with a positive value of δ i, j , i.e., that obtain is learned by the meta-classifier using the meta-features extracted
a local accuracy higher than the random classifier, are selected to from the training data.
compose the ensemble C . When an unknown sample, xj , is presented to the system, the
meta-features are calculated according to xj , and presented to the
meta-classifier. The competence level δ i, j of the base classifier ci
4.14. Kullback–Leibler divergence (DES-KL)
for the classification of xj is estimated by the meta-classifier.
The impact of the meta-classifier as well as a variation of
The DES-KL method [34] measures the competence of the base
the META-DES framework was evaluated in [97]. Four classifier
classifiers from an information theory perspective. For each in-
models for the meta-classifier were evaluated: MLP, SVM, a Naive
stance, xk , from the whole DSEL, the source of competence Csrc
Bayes (NB) classifier and Random Forest (RF). Experimental results
is calculated as the Kullback–Leibler (KL) divergence between the
demonstrated that the performance of the MLP, SVM and NB were
uniform distribution and the vector of class supports, S(xk ) =
statistically equivalent. However, since an NB obtained the highest
{S1 (xk ), . . . , SL (xk )}, estimated by the base classifier, ci . Then, a
number of wins in the experimental analysis, it was selected as
Gaussian potential function is applied to weight the source of com-
the overall best classification model for the meta-classifier. Three
petence based on the Euclidean distance between xk and the query
versions of the META-DES framework were evaluated: Dynamic se-
sample xj (Eq. (11)).
lection, Dynamic weighting and Hybrid. In the dynamic selection

δi, j = Csrc exp(−d (xk , x j )2 ) (11) approach, only the classifiers that attain a certain level of compe-
xk ∈DSEL tence are used to classify a given query sample. In the dynamic
weighting approach, all classifiers in the pool are used for classi-
Since the KL divergence is always positive, the signal of Csrc is
fication; however, their decisions are weighted based on their es-
set as positive if the base classifier ci predicted the correct label
timated competence levels. Classifiers that attain a higher level of
for the instance xk , and negative otherwise. After computing the
competence for the classification of the given query sample have
KL divergence for all samples in DSEL, the base classifiers, ci , with
a greater impact on the final decision. The hybrid approach con-
a positive value of δ i, j are selected to compose the EoC, C .
ducts both steps: first, the base classifiers that obtained a certain
competence level are selected; then, their outputs are aggregated
4.15. K-Nearest Output Profiles (KNOP) using a weighted majority voting scheme based on their respective
competence level.
The K-Nearest Output Profiles (KNOP) technique [106] works
similarly to the KNORA-U technique, with the difference being that
KNORA-U works in the feature space, while KNOP works in the 4.17. META-DES.Oracle (META-DES.O)
decision space. First, the output profiles’ transformation is applied
over the input xj , giving its output profile x˜ j . Then, the similar- An improvement to the META-DES was proposed in [99]. In this
ity between x˜ j and the output profiles from the dynamic selec- new version of the framework, a total of 15 sets of meta-features
tion dataset, is computed and stored in the set, φ j . Similarly to the were considered. Following that, a meta-features selection scheme
KNORA-U rule, each time a base classifier performs a correct pre- using a Binary Particle Swarm Optimization (BPSO) was conducted
diction, for a sample belonging to φ j , it gains one vote. The votes in order to optimize the performance of the meta-classifier, λ.
obtained by all base classifiers are aggregated to obtain the ensem- The difference between the level of competence estimated by the
ble decision. meta-classifier and that estimated by the Oracle was used as the
fitness function for the BPSO meta-features selection scheme, so
that the difference between the behavior of the meta-classifier and
4.16. META-DES
that of the Oracle in estimating the competence level of the base
classifiers was minimized. The new framework was called META-
The META-DES framework is based on the assumption that the
DES.Oracle since it was based on the Oracle definition.
dynamic ensemble selection problem can be considered as a meta-
Experimental results conducted in [99] demonstrated that the
problem [124]. This meta-problem uses different criteria regarding
META-DES.Oracle dominates the classification results compared
the behavior of a base classifier ci , in order to decide whether it
against previous DES techniques. Its performance is statistically
is competent enough to classify a given test sample xj . The meta-
better when compared to any of the 10 state-of-the-art techniques,
problem is defined as follows [27]:
including the META-DES. This can be explained by two factors:
• The meta-classes are either “competent” (1) or “incompetent” state-of-the-art DES techniques are based only on one criterion to
(0) to classify xj . estimate the competence of the base classifier; this criterion could
• Each set of meta-features fi corresponds to a different criterion be local accuracy, ranking, probabilistic models, etc. In addition,
for measuring the level of competence of a base classifier. through the BPSO meta-features selection scheme, only the meta-
204 R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216

features that are relevant for the given classification problem are 4.19. Dynamic selection in different contexts
selected and used for the training of the meta-classifier λ.
4.19.1. One-Versus-One decomposition (OVO)
Another context where dynamic selection has recently shown a
4.18. Dynamic Selection on Complexity (DSOC) lot of promise is in One-Versus-One (OVO) decomposition strate-
gies [133]. OVO works by dividing a multi-class classification prob-
Brun et al. [123] proposed an interesting dynamic classifier se- lem into as many binary problems as there are combinations be-
lection approach which takes into account data complexity mea- tween pairs of classes [133]. Each base classifier is trained solely to
sures from Ho and Basu [132], together with the local accuracy distinguish between one pair of classes. When a new query sample
estimates of the base classifiers, to perform dynamic selection. is presented for classification, the outputs of all base classifiers are
The proposed system is called Dynamic selection on complexity combined to predict its label. However, since each base classifier
(DSOC). is only trained for a pair of classes, the majority of base classifiers
DSOC aims to select the base classifier ci that presents not only might not even be trained for the corresponding class, and their
a high local performance, but also that was trained using a data decision may hinder the performance of the system. This is called
distribution that presents similar complexity measures, regarding “the non-competent classifier” problem, which is a crucial problem
the shape of the decision boundary and the overlap between the in OVO strategies [133].
classes, extracted from the neighborhood of the query sample xj . Dynamic selection represents an interesting way of solving this
Three complexity measures were considered: The Fisher’s discrimi- non-competent classifier problem, since it provides a methodology
nant ratio (F1), the intra/inter class ratio (N2) and the non-linearity for estimating the competence of the base classifiers on-the-fly,
of the 1-NN classifier (N4). More details about these complexity while thus, avoiding non-competent classifiers which may hinder
metrics are given in [132]. the system decision during the generalization phase. Five dynamic
The base classifier is selected taking into consideration three selection techniques were proposed in this context:
features:

• f1: The similarity in terms of complexity, which is measured by

the differences in F1, N2 and N4 between the training data DSi • The Dynamic-OVO strategy [40] was the first use of DS in OVO.
and the region of competence of θ j . This method works by applying the K-Nearest Neighbors al-
• f2: The distance of xi and the centroid of its class, predicted by gorithm to find the neighborhood of the query sample. Then,
the base classifier ci in its training data DSi . only the classifiers that were trained considering the classes
• f3: The local classifier accuracy based on the region of compe- presented in this neighborhood are used in the combination
tence θ j . scheme. Hence, a base classifier is considered competent if it
was trained with any of the classes presented in the neighbor-
where f1 and f2 are related to the concept of complexity, and can hood of the query sample.
be seen as measures of pertinence of the base classifier, while f3 • The Distance-based Relative Competence Weighting combina-
represents the local competence of the base classifier. The three tion (DRCW-OVO) [134] is an updated version of the Dynamic-
features are combined using Eq. (12). The base classifier with the OVO [40] to further reduce the impact of non-competent clas-
highest value of δ i, j is selected for the classification of xj . sifiers by using a weighting mechanism. The outputs of the se-
lected classifiers are weighted depending on the closeness of
δi, j = (1 − f1i ) + (1 − f2i ) + f3i (12) the query instance to the nearest neighbors of each class in the
problem. The greater the distance, the lower the weight of the
One important aspect covered in [123] is the size of the region classifier, and vice versa.
of competence θ j , since a large number of samples from at least • The DYNOVO technique [39] performs dynamic classifier selec-
two different classes is required in order to extract the data com- tion in each sub-problem of the OVO decomposition, and se-
plexity features. This is different from the other dynamic selection lects the best base classifiers to classify the query sample. In
methods, which usually consider a smaller neighborhood [27]. For this case, the Overall Local Accuracy (OLA) [31] was used in
the size of the neighborhood, the authors varied its size between each sub-problem of the OVO-decomposition. Moreover, the au-
20 and 50, considering 30 UCI classification datasets. Ultimately, thors also considered the use of the K-Nearest Neighbors Equal-
the size was set to 30, which presented the best performance. ity (K-NNE) in order to estimate the region of competence.
Although a significant boost in classification performance from • Zhang et al. [41] recently proposed the OVO-DES-P, which com-
DSOC was observed for some datasets, its overall performance was bines the OVO decomposition with the DES-Performance tech-
not significantly superior when compared to the baseline DS technique (Section 4.13). The system works by training an ensemble
niques. A study was conducted in order to understand why the of classifiers for each sub-problem in the OVO decomposition.
method works for some datasets, and not for others. When there During the generalization phase, the DES-P technique is applied
was an overlap between the complexity measured for the neigh- to select an EoC with the most competent classifiers in each bi-
borhood of the training sample and that for the training bootstraps, nary classification problem of the OVO decomposition.
the contributions of this approach were more evident. The authors • The DRCW-OVO-DES-P [41] is a combination of the OVO-DES-
pointed out that this problem is due to the classifier generation ap- P with the DRCW-OVO. The OVO-DES-P is applied to estimate
proach. As they used the Bagging technique, each bootstrap used to the competence level of each base classifier, and select an EoC
train a base classifier was chosen randomly. Therefore, there was with the most competent classifier in each binary classification
no guarantee that each bootstrap would have a significant differ- problem of the OVO decomposition. Then, the distance relative
ence in relation to the complexity measures. Hence, the authors weighting method (DRCW-OVO) is applied to weight the deci-
mentioned that future work should include the use of a different sion of the base classifiers depending on the closeness of the
pool generation model in order to obtain a better coverage of the query instance to the nearest neighbors of each class in the
problem space. Thus, it will be more likely to find a classifier that problem. The greater the distance, the lower the weight of the
was trained with a bootstrap with similar complexity measures to classifier, and vice versa. The selected base classifiers are com-
the neighborhood of the test instance. bined using the weighted majority voting scheme.
R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216 205

Table 2
Application using DS.

Application Pool generation DS methods Ref.

Credit Scoring Bagging and Boosting OLA, LCA, KNORA-E, KNORA-U [9,10]
Customer Classification Different training sets LCA, OLA, KNORA-E [126]
Music Classification Different feature sets KNORA-E, KNORA-U, OLA, LCA [8,98]
Watch list screening Different feature sets Distance based DS [7,43,44]
Face recognition Different feature sets OLA, LCA, Distance based DS [44,45]
Handwriting recognition Bagging KNORA-E, KNORA-U, KNOP, OLA, LCA [26,98,113]
Signature verification Random subspaces KNOP [42,121]
Forest species Different feature sets MCB [137]
Remote sensor images Bagging MLA [32]
Time series forecasting Heterogeneous classifiers MCB [138]
Antibiotic Resistance Bagging LCA [37]
Bioprosthetic-hand Heterogeneous classifiers DES-PRC [139–141]

Classification results demonstrated that the DRCW-OVO-DES-P between simplicity and classification accuracy. Moreover, we can
strategy outperformed the other dynamic selection approaches in see that DES methods are more popular than DCS ones, which
the context of OVO decomposition classification. may account for the many works which point out that DES tech-
niques usually achieve higher classification accuracy. Another in-
4.19.2. One-Class Classification (OCC) teresting fact is that the majority of the related works apply DS
One-class classification (OCC) is one of the most difficult prob- techniques for a pool of classifiers trained using different feature
lems in machine learning. It is based on the assumption that dur- spaces [7,8,43,45,137].
ing the training stage, only objects originating from one class are
present, with no access to counterexamples, making it difficult to 5.1. Credit scoring
train an efficient classifier, since there is no data available to prop-
erly estimate its parameters [35,135]. It can therefore be consid- Credit scoring is one of the most studied problems in pat-
ered as an ill-posed problem. As has been mentioned in several tern recognition. Its difficulty comes from the observation that the
works [27,99,106], dynamic selection techniques outperform strong problem is often heavily imbalanced, since there are way fewer
classification models, such as SVMs when dealing with ill-posed samples from customers with poor credit scores [126]. Moreover,
problems. Therefore, the application of DS in the context of OCC is high accuracy in this problem is very important, since even just 1%
expected to improve generalization performance. in improvement in classification accuracy should greatly increase
As reported in [35], dynamic selection techniques work well in the profits of financial institutions [142].
this context because during the training stage it is possible to gen- Many works consider the use of DS techniques for the credit
erate a diverse pool of classifiers. The flexibility of DS techniques scoring problem [9,10,126]. In [126], the authors proposed a dy-
then allows the most competent classifier to be selected to clas- namic ensemble selection system that takes into account the cost
sify each new test sample. Three One-Class DCS (OCDCS) methods of misclassifications of each class. The proposed approach, called
were proposed by adapting three DCS methods proposed in [136]: Dynamic Classifier Ensemble for Imbalanced Distributions (DCEID),
One-Class Entropy Measure Selection (OCDCS-EM), which is based uses cost-sensitive versions of the LCA and OLA techniques in
on the entropy measure, One-Class Minimum Difference Selec- dealing with the imbalanced nature of this problem. Another re-
tion (OCDCS-MD), and the Fuzzy Competence Selection (OCDCS- cent use of DS techniques for credit scoring was proposed by
FC). These three OCDCS methods are based on the potential func- Xiao et al. [10]. The authors proposed a new ensemble genera-
tion model to weight the decisions obtained by each sample in the tion method to increase the diversity between the members of the
reference set according to its distance to the given query instance. pool. Then, two DCS and two DES schemes were considered: LCA
An experimental evaluation demonstrated that the OCDCS- and OLA as DCS, KNORA-E and KNORA-U as DES methods. Simi-
EM works well for small datasets, while the Fuzzy Competence lar to previous works, the authors concluded that DES techniques
(OCDCS-FC) is the best choice for large datasets. However, only DCS presented the best classification performance.
methods were proposed for OCC. Since DES usually presents better An extensive comparison between DS techniques and other
results than DCS, it is reasonable to think that a next step in this classification schemes considering eight credit scoring datasets was
direction would be to adapt DES techniques for one-class classifi- conducted by Lessmann et al. [9]. However, the dynamic selection
cation problems. techniques considered in their analysis did not improve the classi-
fication accuracy when compared to traditional credit scoring ap-
5. Applications proaches.

In this section, we present a review of real-world applications 5.2. Music genre classification
using dynamic selection techniques. Moreover, we also discuss how
the authors adapt traditional DS techniques to the intrinsic charac- In [8], the authors investigated the use of dynamic ensemble
teristics of their applications; these includes aspects such as im- selection techniques for music genre classification. In their solu-
balanced distributions in customer classification and credit scor- tion, first, a pool of weak classifiers was generated, trained with
ing [126] and the lack of validation samples in face recognition distinct segments of the audio signal and different feature extrac-
applications [7,45]. tion methods. A total of 13 distinct feature extraction methods
Table 2 lists several real-world applications of DS techniques. were considered, each corresponding to a different musical aspect,
Based on usage statistics respecting the use of DS techniques in such as harmony, timbre and rhythm. In the generalization phase,
different applications, the KNORA-E and KNORA-U methods are the the KNORA-E and KNORA-U techniques were considered for the
most commonly used in different applications. This may be ex- classification of each unknown music sample. In the experiments
plained by the fact these techniques represent a good trade-off conducted using the Latin Music Dataset [143], the use of DES
206 R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216

achieved a recognition accuracy of about 70%, which significantly all cases, dynamic selection provided a better classification perfor-
improved the classification performance, as compared to the 54% mance when compared to static ensemble techniques.
seen with of the best single classifier model.
5.3.4. Forest species recognition
5.3. Image recognition Martins et al. [137] used the DS for the forest species recogni-
tion problem. The system was based on multiple features extrac-
5.3.1. Face recognition and watch list screening tion methods, such as texture (Gabor filters and Local Binary Pat-
The face recognition problem presents an interesting use of dy- terns), as well as key point-based features (SIFT and SURF) to gen-
namic selection techniques, since in such applications, we do not erate a diverse pool of classifiers. Then, several static and dynamic
have enough data available to have a dynamic selection dataset selection methods were evaluated. The MCB technique presented
(DSEL). Thus, the authors in [7,43] adapted the dynamic selection the best result, achieving a 93.03% accuracy.
techniques considering just the query sample and a pool of SVM
classifiers trained using different feature extraction methods. The
5.4. Time series forecasting
algorithm is based on the distance between the query sample and
the support vectors of the SVM classifier for the negative samples
Sergio et al. [138] proposed a dynamic selection of regressors
(i.e., face images belonging to different users). The higher the dis-
for the time series forecasting. An adaptation of the MCB tech-
tance between the support vectors, the more competent the SVM-
nique for regression was used to perform the dynamic selection
Feature Extraction pair is. Therefore, the system is not only select-
steps. The proposed system was called Dynamic Selection of Fore-
ing the SVM, but also choosing which feature extraction method is
cast Combiners (DS-FC). The pool of classifiers was composed of
more suitable for the classification of an unknown face image.
four regression models: a feed-forward neural network with one
hidden layer, a feed-forward neural network with two hidden lay-
5.3.2. Handwriting recognition
ers, a deep belief network (DBN) with two hidden layers and a
Dynamic ensemble and classifier selection techniques have
support vector machine for regression (SVR).
been used to solve several image recognition problems. Ko
The proposed DS-FC was used to forecast eight time series with
et al. [26] used the KNORA-E and KNORA-U techniques for the NIST
chaotic behavior considering short and long term series. The pro-
SD19 handwritten recognition dataset. Although the performance
posed dynamic selection scheme outperformed static combination
of those techniques did not present the highest classification per-
schemes in six out of eight time series. Moreover, it also presented
formance, their classification results were among the best achieved
better results than most of the state-of-the-art time series forecast-
for this problem to that point. In addition, the proposed KNORA
ing techniques.
techniques outperformed static ensemble selection schemes such
as GA and MVE [46].
Furthermore, in [113], the authors applied the KNOP tech- 5.5. Biomedical
nique for four handwriting recognition datasets: NIST Letters, NIST
Digits, Japanese Vowels, and Arabic Digits. Experiments demon- 5.5.1. Antibiotic resistance
strated that the proposed approach outperformed the state-of- Tsymbal et al. [37] proposed a dynamic ensemble method to
the art methods for the Arabic and Japanese datasets. The ac- tackle the problem of antibiotic resistance. This is a typical ex-
curacy achieved in the NIST Letters and Digits datasets were ample of a changing environment (concept drift), where pathogen
comparable to the state-of-the-art results obtained by SVM and sensitivity may change over time as new pathogen strains de-
MLP classifiers [144,145]. However, in another publication, Cavalin velop resistance to previously used antibiotics. A dynamic classifier
et al. [106] demonstrated that the use of dynamic selection out- method was proposed to deal with this problem, with the authors
performs both SVM and MLP classifiers when the training dataset considering a variation of the LCA technique, in which the distance
is small (less than 50 0 0 examples). between the neighbors are also taken into account.
Three dynamic approaches were considered: Dynamic Voting
5.3.3. Signature verification (DV), Dynamic Selection (DS) and Dynamic Voting with Selec-
Batista et al. [42] evaluated four dynamic selection techniques tion (DVS). The methodology was evaluated considering gradual
for the signature verification problem: KNORA-UNION, KNORA- and abrupt drift scenarios. Experimental results demonstrated that
ELIMINATE and their corresponding versions using output profile, the approaches using dynamic selection presented the best per-
namely, OP-ELIMINATE and OP-UNION. The system was based on formance. Furthermore, the dynamic approaches always obtained
an ensemble of Hidden Markov Models (HMMs) used as feature a better result than the best base classifier model and ensemble
extractors from the signature image. Each HMM was trained us- technique.
ing different numbers of states and codebook sizes in order to
learn signatures from different levels of perception. The features 5.5.2. Bioprosthetic hand
extracted using HMMs were merged in a feature vector. For each Kurzynski et al. [139,147] proposed a dynamic ensemble selec-
writer, the random subspace method was used to train a pool of tion system for the recognition of electromyography (EMG) signals
100 Gaussian SVM classifiers. for the control of a bioprosthetic hand. The proposed solution was
The proposed approach was applied to two well-known signa- based on the estimation of the classifiers competence using the
ture verification datasets: GPDS [146] and Brazilian signature com- probabilistic randomized reference classifier proposed in [25]. The
posed of random, simple and skilled forgeries. Experimental re- pool of classifiers consisted of seven different classification mod-
sults demonstrated that dynamic selection can significantly reduce els, including linear and quadratic discriminant classifiers (LDC and
the overall error rates as compared to other combination methods. QDC) and MLP neural networks.
In the majority of the experiments conducted, the OP-ELIMINATE Moreover, in [140,141], the authors proposed a method for the
presented the best performance. It must be noted however, that control of a bioprosthetic hand using a two-stage MCS and DES for
the OP-UNION method worked better when the SVM classifiers the recognition of EMG and mechanomiographic (MMG) signals in-
were trained with a limited number of signature samples, since dicating the patient’s movement intention. Additionally, feedback
classifiers trained with fewer signatures are less accurate. Thus, information coming from bioprosthesis sensors was used to cali-
more classifiers are needed to form a robust EoC. Nevertheless, in brate the competence of the base classifiers estimated using the
R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216 207

Table 3
Summary of the 30 datasets used in the experiments.

Database No. of instances Dimensionality No. of classes Source

Adult 48,842 14 2 UCI

Banana 10 0 0 2 2 PRTOOLS
Blood transfusion 748 4 2 UCI
Breast (WDBC) 568 30 2 UCI
Cardiotocography (CTG) 2126 21 3 UCI
Ecoli 336 7 8 UCI
Steel Plate Faults 1941 27 7 UCI
Glass 214 9 6 UCI
German credit 10 0 0 20 2 STATLOG
Haberman’s Survival 306 3 2 UCI
Heart 270 13 2 STATLOG
ILPD 583 10 2 UCI
Ionosphere 315 34 2 UCI
Laryngeal1 213 16 2 LKC
Laryngeal3 353 16 3 LKC
Lithuanian 10 0 0 2 2 PRTOOLS
Liver Disorders 345 6 2 UCI
MAGIC Gamma Telescope 19,020 10 2 KEEL
Mammographic 961 5 2 KEEL
Monk2 4322 6 2 KEEL
Phoneme 5404 6 2 ELENA
Pima 768 8 2 UCI
Satimage 6435 19 7 STATLOG
Sonar 208 60 2 UCI
Thyroid 215 5 3 LKC
Vehicle 846 18 4 STATLOG
Vertebral Column 310 6 2 UCI
WDG V1 50 0 0 21 3 UCI
Weaning 302 17 2 LKC
Wine 178 13 3 UCI

RRC technique during the operation phase. The two-stage tech- Nearest Output Proﬁles (KNOP) [106,113], Dynamic Ensemble Se-
nique developed provided the state-of-the-art results for control lection Performance (DES-P) [34], Dynamic Ensemble Selection
of the bioprosthetic hand, considering several types of movements, Kullback–Leibler (DES-KL) [34], DES Clustering [30], DES KNN [30],
such as hook, power and pinch. Meta Learning for Dynamic Selection (META-DES) [27] and META-
DES.Oracle [99]. All the DS methods considered in this evaluation
6. Comparative study are detailed in Section 4 and summarized in Table 1.

In this section, we present an empirical comparison between 18 6.1. Datasets

state-of-the-art techniques under the same experimental protocol.
First, we compare the results of each DS techniques (Section 6.3). The comparative study was performed using a test bed com-
The DS techniques are then also compared against the baseline posed of 30 classification problems proposed in [27]. Sixteen
methods, namely, (1) Static Selection (SS), that is, the selection of datasets were taken from the UCI machine learning reposi-
an EoC during the training stage, and its combination using a ma- tory [148], four from the STATLOG project [149], four from
jority voting scheme [46]; (2) single best (SB), which corresponds the Knowledge Extraction based on Evolutionary Learning (KEEL)
to the performance of the best classifier in the pool according to repository [150], four from the Ludmila Kuncheva Collection of real
the validation data, and (3) majority voting (MV), which corre- medical data [151], and two artificial datasets generated with the
sponds to the majority voting combination of all classifiers in the Matlab PRTOOLS toolbox [152].
pool without any pre-selection of classifiers. We also compare the
performance of DS techniques with the best classification models 6.2. Experimental setup
according to [4] (Section 6.4).
We evaluate a wide range of DS techniques covering all points To ensure a fair comparison of the results obtained in this
of the proposed DS taxonomy: classifier and ensemble selection analysis with previous results from the DES literature, the same
techniques, techniques based on K-NN, clustering, and competence experimental setup from previous works [27,97] was used. For
maps through a potential function (explained in Section 3.1), as each dataset, the experiments were carried out using 20 replica-
well as different information sources (criteria), to estimate the tions. For each replication, the datasets were randomly divided as
competence level of the base classifiers, such as, local accuracy, follows: 50% for training, 25% for the dynamic selection dataset
ranking, probabilistic models and oracle (explained in Section 3.2). (DSEL) and 25% for the test set. The divisions were performed
For dynamic classifier selection (DCS), the following techniques while maintaining the prior probabilities of each class. Similar to
were evaluated: Local Class Accuracy (LCA) [31], Overall Local our previous works [27,97], the pool of classifiers C was composed
Accuracy (OLA) [31], Modified Local Accuracy (MLA) [32], Modi- of 100 Perceptrons generated using the Bagging technique [50]. The
fied Classifier Ranking (RANK) [31,33], Multiple Classifier Behavior size of the region of competence (neighborhood size) K was equally
(MCB) [112], A Priori [101,129] , A Posteriori [101,129] and the Dy- set to 7 for all the techniques based on K-NN. Table 3.
namic Selection on Complexity (DSOC). For dynamic ensemble se- For the DES-KMEANS and DES-KNN, the number of base classi-
lection, the following techniques were considered: K-Nearest Ora- fiers selected using accuracy (N) and diversity (J) was set to 30% of
cles Eliminate (KNORA-E) [26], K-Nearest Oracles Union (KNORA- the whole pool for both cases, as suggested in [30,116]. In addition,
U) [106], Randomized Reference Classifier (DES-RRC) [105], K- the number of clusters in the DES-KMEANS was set to 6 following
208 R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216

Fig. 5. Average rank of the 18 dynamic selection methods over the 30 datasets. The best algorithm is the one presenting the lowest average rank. Techniques in which the
difference in average ranks is lower than the critical difference are connected by a black bar.

Table 4 ranks is lower than the critical difference are connected by a black
Overall results considering the 30 classification datasets. The average
bar (i.e., the results are statistically equivalent according to the
ranks and accuracy for each DS technique are presented. Standard
deviation is presented in parenthesis. ranking analysis).
Based on the ranking analysis, we can clearly see that DES tech-
DS method Avg. rank DS method Mean accuracy
niques outperform DCS ones. Among the top 10 techniques, 8 are
META-DES.O 3.87(3.54) META-DES.O 83.92(9.13) DES. The only DES methods that did not achieve a lowest aver-
META-DES 4.17(2.98) META-DES 83.24(8.94) age rank in comparison with the DCS ones were the DES-KNN and
DES-RRC 5.97(4.66) DES-P 82.26(9.26)
DES-Clustering. Interestingly, these two techniques take into ac-
KNORA-U 6.83(4.11) DES-RRC 82.11(8.76)
DES-P 7.13(3.69) KNORA-U 81.69(9.82) count diversity measures to increase the diversity in the EoC, af-
DES-KL 7.73(4.92) DES-KL 81.52(8.77) ter the most competent classifiers are selected. This result may in-
KNOP 9.53(3.98) KNOP 80.81(8.92) dicate that adding diversity to EoC does not provide classification
KNORA-E 9.77(3.88) KNORA-E 80.36(10.75)
benefits at the instance level, i.e., for the classification of a single
LCA 10.10(4.66) OLA 79.87(10.67)
OLA 10.40(4.95) DCS Rank 79.69(10.38) instance. As reported in [156], we should promote the consensus
MCB 11.17(4.74) LCA 79.57(9.84) in the EoC, for the classification of a single query xj , rather than
DES-KNN 11.17(4.40) MCB 79.56(9.70) diversity. Our experimental analysis supports this hypothesis.
DSOC 11.37(5.74) DSOC 79.33(9.44) The six techniques with the lowest average rankings (DES-
A Posteriori 11.47(5.56) DES-KNN 79.29(10.23)
P, DES-KL, KNORA-U, DES-RRC, META-DES and META-DES.Oracle)
DCS Rank 11.80(4.20) A Priori 78.57(11.18)
DES-KMEANS 12.73(3.84) DES-KMEANS 78.49(10.40) were considered equivalent in this analysis. However, one problem
MLA 12.80(4.60) A Posteriori 78.14(11.53) with the ranking analysis is that the result of the comparison be-
A Priori 13.00(4.53) MLA 77.34(9.78) tween two techniques changes according to the other techniques
that were considered in the test. This problem can cause several
type I errors according to [157]. For this reason, we performed a
the experiments conducted by Soares et al. [30]. Lastly, the values new test considering only these top 6 techniques. The CD diagram
of the hyper-parameters Kp and hc for the META-DES framework considering only the top 6 techniques is shown in Fig. 6. Moreover,
were set to 5 and 80% according to the results presented in [27]. the classification results obtained for the 30 datasets by the top 6
Moreover, the BPSO optimization scheme of the META-DES.Oracle DS techniques are presented in Table 5.
was conducted using the V-Shaped transfer function [99,153]. Furthermore, for a better comparison of the top DS techniques,
we conducted two additional analysis: an n × n comparison consid-
6.3. Comparison of dynamic selection techniques ering the hypothesis of equality between all existing pairs of algo-
rithms using Bergman–Hommel procedure [158–160], and a pair-
The Friedman rank [154] test was used for the statistical com- wise test using the Wilcoxon Sign Test as recommended by Be-
parison of the dynamic selection techniques over the 30 classifi- navoli et al. [157]. The results of both tests for each pairwise com-
cation datasets. The average ranks were calculated as follows: For parison are presented in Table 6. Hypotheses that are rejected at a
each dataset, the method that achieved the best performance re- α = {0.1, 0.05, 0.01} are marked with a •, ••, and •••, respectively.
ceived rank 1, the second best rank 2, and so forth. In case of a The Bergmann–Hommel test was conducted using the JAVA code
tie, i.e., two methods presented the same classification accuracy for published by Garcia and Herrera [159]4 .
the dataset, their average ranks were summed and divided by two. Based on the results, we can see that the two versions of
The average rank was then obtained, considering all datasets. The the META-DES framework and the Randomized Reference Classifier
best performing algorithm was the one presenting the lowest av- (DES-RCC) presented the best results. According to the Bergmann–
erage rank. Next, the critical difference (CD) value was calculated Hommel test, these three techniques are statistically equivalent.
using the Bonferonni–Dunn post-hoc test recommended in [155]. However, the Wilcoxon Sign test shows that both the META-DES
The performance of two techniques was deemed statistically dif- and the META-DES.Oracle outperform the DES-RRC method, with a
ferent when their difference in average rank was higher than the level of significance α = 0.01.
critical difference CD. Table 4 presents the average accuracy of all
DS methods as well as their average ranking.
We use the critical difference diagram proposed in [155] in or-
der to have a visual illustration of the statistical test. The CD di-
agram with the results of the Bonferonni–Dunn post-hoc test is
shown in Fig. 5. Techniques in which the difference in average 4
Code available at http://sci2s.ugr.es/keel/multipleTest.zip.
R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216 209

Fig. 6. Average rank of the top six dynamic selection methods over the 30 datasets.

Table 5
Mean and standard deviation results for the top 6 DS techniques. The best results for each dataset are highlighted
in bold.

Dataset DES-RRC META-DES META-DES.O DES-KL DES-P KNORA-U

Pima 77.64(2.73) 79.03(2.24) 77.53(2.24) 77.97(2.64) 76.87(1.87) 78.84(2.18)

Liver 68.01(4.14) 70.08(3.49) 72.02(4.72) 67.11(5.62) 67.46(3.84) 61.29(3.76)
Breast 96.94(0.61) 97.40(1.07) 96.71(0.86) 97.13(0.59) 96.78(0.78) 97.41(1.02)
Blood 78.02(1.41) 79.14(1.03) 79.38(1.76) 78.83(1.09) 77.72(1.53) 79.25(3.36)
Banana 86.56(1.76) 91.78(2.68) 94.54(1.16) 85.58(2.40) 93.61(1.80) 92.40(2.87)
Vehicle 83.34(1.81) 82.75(1.70) 82.87(1.64) 82.99(1.74) 82.85(1.97) 83.12(1.70)
Lithuanian 85.34(1.57) 93.18(1.32) 94.97(2.00) 83.50(2.82) 91.90(3.49) 95.63(2.64)
Sonar 80.77(5.09) 80.55(5.39) 81.63(3.90) 78.15(6.28) 79.49(6.66) 77.34(1.94)
Ionosphere 88.80(2.48) 89.94(1.96) 89.94(1.97) 88.42(2.78) 88.42(2.62) 88.42(1.67)
Wine 98.77(1.57) 99.25(1.11) 99.52(1.11) 98.26(2.45) 98.48(2.30) 98.03(1.62)
Haberman 75.98(2.50) 76.71(1.86) 72.03(2.67) 75.04(2.17) 75.57(2.54) 74.51(2.27)
CTG 85.41(1.02) 84.62(1.08) 86.37(1.10) 84.98(1.39) 85.11(1.39) 86.22(2.20)
Vertebral 86.76(3.68) 86.89(2.46) 84.90(5.33) 84.19(6.40) 86.76(3.47) 88.03(2.24)
Faults 68.20(0.95) 67.21(1.20) 69.32(1.18) 67.91(1.29) 68.03(1.36) 68.58(1.98)
WDVG1 84.63(0.48) 84.56(0.36) 84.72(0.49) 84.61(0.53) 84.59(0.54) 84.18(1.10)
Ecoli 80.66(3.58) 77.25(3.52) 81.57(3.47) 79.95(4.30) 79.83(4.26) 77.12(3.41)
GLASS 66.04(4.23) 66.87(2.99) 66.46(4.22) 63.32(4.27) 63.13(5.30) 62.05(2.88)
ILPD 68.58(1.89) 69.40(1.64) 69.79(3.15) 69.94(2.49) 68.02(2.21) 69.86(1.58)
Adult 87.16(1.53) 87.15(2.43) 87.74(2.04) 86.06(2.90) 87.10(2.76) 80.21(2.26)
Weaning 78.96(3.29) 79.00(3.78) 81.73(3.14) 78.74(4.44) 78.08(4.07) 82.02(3.65)
Laryngeal1 87.21(3.79) 79.67(3.78) 87.42(2.98) 86.54(4.25) 86.35(3.82) 82.38(4.45)
Thyroid 97.61(0.96) 96.78(0.87) 96.99(2.14) 97.04(1.09) 96.98(1.12) 96.71(1.89)
Laryngeal3 72.74(1.87) 72.65(2.17) 73.67(0.75) 72.67(2.37) 73.34(2.81) 72.36(1.25)
German 75.83(2.36) 75.55(1.31) 76.58(1.99) 73.88(2.15) 74.72(3.50) 73.16(1.80)
Heart 83.99(3.64) 84.80(3.36) 86.44(3.38) 82.83(4.13) 83.27(3.68) 84.15(4.05)
Segmentation 96.38(0.75) 96.21(0.87) 96.65(0.83) 96.20(0.89) 96.22(0.87) 96.64(1.07)
Phoneme 74.65(1.55) 80.35(2.58) 85.05(1.08) 77.13(1.32) 81.64(0.53) 79.94(3.33)
Monk2 80.98(2.58) 83.24(2.19) 94.45(1.88) 80.85(2.68) 79.93(2.57) 77.88(4.25)
Mammographic 85.00(1.32) 84.82(1.55) 80.72(2.56) 84.12(2.26) 84.98(1.86) 82.91(2.27)
Magic 86.20(1.84) 84.35(3.27) 86.02(2.20) 83.56(1.22) 83.54(1.34) 82.99(1.25)
Average 82.11(8.76) 83.20(8.94) 83.92(9.13) 81.52(8.77) 81.29(9.08) 81.69(9.82)
Avg. rank 3.31(1.54) 2.48(0.80) 2.21(1.13) 4.50(0.98) 4.51(1.06) 3.96(1.72)

6.4. Comparison with different classification approaches those of the best classifier in the pool, Single Best (SB); the se-
lection of the best base classifiers in the pool, Static Selection (SS),
In this section, we compare the results obtained by DS tech- and the majority voting combination of all classifiers in the pool,
niques against monolithic classifier models. The objective of this Majority Vote (MV). We also included these three methods since
study is to determine whether the performance of DS methods in they are extensively used as baseline methods in the dynamic se-
relation to the best off-the-shelf classifiers. Three single classifier lection literature.
models were considered: Multi-Layer Perceptron (MLP) Neural Net- All classifiers were evaluated using the Matlab PRTOOLS tool-
work, Support Vector Machine with Gaussian Kernel (SVM) and K- box [152]. The dynamic selection dataset (DSEL) was used as the
Nearest Neighbor classifier. As ensemble methods, we considered validation set in the training process of the classifiers, and as a
the Random Forest [63] and Adaboost [52] techniques. These clas- result, all methods were trained using the same amount of data
sifiers were selected based on a recent study [4] that ranked the available. The distribution of the test set remained the same. The
best classifiers in a comparison considering a total of 179 classi- hyper-parameters of the classifiers were set as follows:
fiers over 121 classification datasets.
Furthermore, as reported by Britto et al. [24], usually, the per- 1. Single Best (SB): The base classifier with the highest classifica-
formances of dynamic selection techniques are compared with tion accuracy in the validation set is selected for classification.
210 R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216

Table 6
Pairwise comparison of the top six DS techniques. (a) Comparison with the adjusted p-values calculated using the Bergmann-Hommel procedure. (b)
Pairwise comparison using the Wilcoxon Sign-test. The hypothesis are ordered in an ascending order according to the p-value. Hypothesis that are
rejected at a α = {0.1, 0.05, 0.01} are marked with a •, ••, ••• respectively.

(a) (b)

Hypothesis Berg p Hypothesis Wilcoxon p

META-DES.O vs DES-P 2.88E−5 ••• META-DES vs DES-KL 5.21E−6 •••

META-DES.O vs DES-KL 2.88E−5 ••• META-DES vs DES-P 5.30E−5 •••
META-DES vs DES-P 2.56E−4 ••• META-DES.O vs DES-KL 4.19E−4 •••
META-DES vs DES-KL 2.56E−4 ••• META-DES.O vs DES-P 0.0010 •••
META-DES.O vs KNORA-U 0.0020 ••• DES-RRC vs DES-KL 0.0073 •••
META-DES vs KNORA-U 0.0085 ••• META-DES.O vs DES-RRC 0.0093 •••
DES-RRC vs DES-P 0.0908 • META-DES.O vs KNORA-U 0.0107 ••
DES-RRC vs DES-KL 0.0908 • META-DES vs DES-RRC 0.0136 ••
META-DES.O vs DES-RRC 0.1366 META-DES vs KNORA-U 0.0148 ••
META-DES vs DES-RRC 0.3379 DES-RRC vs DES-P 0.0159 ••
DES-RRC vs KNORA-U 0.5352 META-DES.O vs META-DES 0.0333 ••
DES-P vs KNORA-U 1.0 0 0 0 DES-KL vs DES-P 0.2844
DES-KL vs KNORA-U 1.0 0 0 0 DES-P vs KNORA-U 0.3252
META-DES.O vs META-DES 1.0 0 0 0 DES-RRC vs KNORA-U 0.3286
DES-KL vs DES-P 1.0 0 0 0 DES-KL vs KNORA-U 0.8882

2. Majority Voting (MV): The outputs of all base classifiers in the Table 7
Overall results considering the 30 classification datasets. The aver-
pool are combined using the majority voting rule [79].
age ranks and accuracy for each algorithm is presented. Standard
3. Static Selection (SS): A GA ensemble selection approach based deviation is presented in parenthesis.
on the majority voting accuracy presented in [46]. The param-
Algorithm Avg. Rank Algorithm Accuracy
eters of the GA method were set according to [46]. The valida-
tion set was used for the computation of the majority voting META-DES.O 5.43(4.92) META-DES.O 83.92(9.13)
accuracy. META-DES 5.70(4.28) META-DES 83.24(8.94)
DES-RRC 7.67(6.23) DES-P 82.26(9.26)
4. AdaBoost: We set the number of iterations of the algorithms to
DES-P 9.17(5.27) SVM 82.22(10.24)
100. The Perceptron classifier was used as the weak model. The KNORA-U 9.33(6.40) DES-RRC 82.11(8.76)
Multi-Class Adaboost [161] was used for the multi-class prob- DES-KL 9.90(6.42) KNORA-U 81.69(9.82)
lems. SVM 11.07(8.14) DES-KL 81.52(8.77)
KNOP 13.07(5.86) KNOP 80.81(8.92)
5. Random Forest (RF): The number of trees was set to 200. The
KNORA-E 13.23(5.62) RF 80.78(10.98)
number of leaves was set to the square root of the number of RF 13.77(9.50) KNORA-E 80.36(10.75)
features as recommended in [64,162,163]. LCA 14.30(6.42) OLA 79.87(10.67)
6. Multi-Layer Perceptron (MLP): We varied the number of neu- OLA 14.60(7.02) DCS Rank 79.69(10.38)
rons in the hidden layer from 10 to 100 at 10 point intervals. MV 14.93(6.62) LCA 79.57(9.84)
SS 14.97(6.38) MCB 79.56(9.70)
The configuration that achieved the best results in the valida-
MCB 15.03(7.49) MV 79.51(9.39)
tion data was used. The MLP training process was conducted AdaBoost 15.43(7.63) SS 79.40(10.12)
using the Levenberg–Marquadt algorithm [164]. The training DES-KNN 15.53(6.49) DES-KNN 79.29(10.23)
was stopped if the performance on the validation set decreased DCS Rank 16.33(5.77) AdaBoost 79.23(10.32)
A Posteriori 16.40(7.91) MLP 79.20(11.74)
or failed to improve for five consecutive epochs (early stop-
SB 16.47(6.04) SB 79.06(9.98)
ping). DSOC 16.87(7.93) DSOC 79.00(9.44)
7. Support Vector Machine with a Gaussian Kernel (SVM): A grid MLP 16.90(8.45) A Priori 78.57(11.18)
search was performed in order to set the values of the regular- 7-NN 17.40(8.59) DES-KMEANS 78.49(10.40)
ization parameter, c, and the Kernel spread parameter γ . DES-KMEANS 17.50(6.13) A Posteriori 78.14(11.53)
MLA 18.20(7.41) 7-NN 77.42(13.06)
8. K-Nearest Neighbors (K-NN): For the K-Nearest Neighbors clas-
A Priori 18.30(6.24) MLA 77.34(9.78)
sifier, we considered a neighborhood size K = 7 so that the 1-NN 20.50(8.10) 1-NN 76.64(11.98)
same neighborhood size was used for both the K-NN and the
DS techniques. In addition, we also considered the performance
of the 1-NN as the baseline for this method.

Table 7 presents the average accuracy of all classification meth- the use of a pool of SVM classifiers, such as in the following
ods, as well as their average ranking. In comparison to the baseline works [7,25,30,34,156]. It is also possible to train a Random Forest,
methods (SB, SS and MV), we can see that the majority of the DS and apply dynamic selection for classification, instead of Majority
techniques improve upon the SB (only the MLA presented both a Voting.
lower ranking and average accuracy). With respect to SS and MV, Moreover, the hyper-parameters of the SVM classifier were op-
66% of the DS considered presented better results. Furthermore, timized for each dataset, while for the DS techniques, the val-
the top DS techniques (Table 5) also presented a higher average ac- ues of the hyper-parameters were set based on previous publica-
curacy and a better ranking when compared to the Random Forests tions [26,102]. An optimization of the hyper-parameter (e.g., neigh-
and AdaBoost techniques, which are classical static ensemble tech- borhood size K for the DS methods based on the KNN), as well
niques. as the evaluation of DS techniques using a different base classi-
The SVM and RF classifier presented very high recognition ac- fier model such as SVM, could further illustrate the benefits of DS
curacy. However, it must be pointed out that the DS techniques in techniques. The use of techniques to estimate the best number of
this paper were all evaluated using a pool of weak, linear classi- classifiers in the pool, such as in [165], could also be employed for
fiers. All methods considered in this work could also benefit from improving the classification performance of DS algorithms.
R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216 211

namic selection techniques and categorize them based on the pro-

posed taxonomy. We also present a review of DS methods used in
different contexts such as One-Class Classification and One-Versus-
One decomposition, as well as the application of DS techniques to
solve complex real-world problems, such as face recognition and
music genre classification.
A comparative study is carried out to compare several DCS and
DES algorithms under the same testing conditions. The experimen-
tal analysis clearly demonstrates that DES techniques outperform
DCS ones. Eight of the top 10 techniques are DES methods. The
only exceptions are the two techniques that are based on diversity.
This finding may indicate that increasing diversity at the instance
level may not improve the generalization performance of DS. An-
other interesting finding relates to the comparison between DS and
Fig. 7. Pairwise comparison between the results achieved using the different the K-NN technique, having both techniques using the same neigh-
DS techniques and the 7-NN. The analysis is based on wins, ties and losses.
borhood. Results show that the majority of DS methods present
The vertical lines illustrate the critical values considering a confidence level α =
{0.10, 0.05, 0.01}. statistically superior performance based on the Sign test. Moreover,
the DS methods usually improve upon the baseline methods (SB
and MV), as well as on other classic classifier ensemble techniques
Another interesting observation is the dominance of DS tech- such as AdaBoost.
niques over the K-NN method. All DS techniques presented a bet- In general, the methods based on meta-learning obtained the
ter ranking and average accuracy when compared to the 1-NN, and best performance, which can be explained by the fact they use
only the MLA technique presented a lower classification accuracy several sources of information to perform the dynamic selection
and lower rank than the K-NN using the same neighborhood size. scheme. In addition, since the selector mechanism in the meta-
This is an interesting finding, since the majority of the DS tech- learning approaches is based on another classification model, these
niques in this study (14 methods) use the K-NN method in the methods also adapt to the characteristics of each classification
process of estimating the local competence of the base classifiers. problem (e.g., imbalance, noise, etc.). However, they also present a
A pairwise analysis was conducted based on the Sign test [155], significant increase in complexity, since it is not only multiple sets
computed on the number of wins, ties and losses obtained by each of meta-features that need to be extracted, but there is also the
DS, compared to the 7-NN (i.e., same neighborhood size). The null cost involved in using the meta-classifier for the local competence
hypothesis, H0 , meant that both techniques obtained statistically estimation.
equivalent results. A rejection in H0 meant that the classification Notwithstanding all recent contributions that have been made
performance obtained by a corresponding DS technique was sig- in this field, the definition of an ideal DS technique is still far from
nificantly better at a predefined significance level α . In this case, finalized. There are several open research questions that need to
the null hypothesis, H0 , is rejected when the number of wins is be properly answered, as well as challenges to further improve the
greater than or equal to a critical value, denoted by nc . The critical performance of DS techniques. Therefore, we present some per-
value is computed using Eq. (13): spectives and our points of view on these topics.
√
nexp nexp
nc = + zα (13) 7.1. The Oracle in dynamic selection
2 2
where nexp is the total number of experiments. We ran the Many authors have used the notion of the Oracle to gener-
test considering three levels of significance: α = {0.10, 0.05, 0.01}. ate a pool or ensemble of classifiers. For example, dos Santos
Fig. 7 shows the results of the Sign test, and the different bars rep- et al. [71,73] used the Oracle performance in a greedy search al-
resent the critical values for each significance level. We can see gorithm to generate the pool of classifiers. Although the proposed
that at a 0.1 significance level, all DS techniques obtained a sig- pool achieved a higher Oracle performance, the classification accu-
nificant number of wins. Using an α = 0.05, only two DS meth- racy obtained by the pool generated using the Oracle performance
ods (DS-KMEANS and MLA) did not present a significant number was inferior to the pool of classifiers generated using the majority
of wins. Moreover, even restricting the test to a significance level voting accuracy or diversity as search criteria.
of 0.01, we could see that 12 DS methods out of 18 obtained a sig- In a recent paper, Souza et al. [166] studied the relationship be-
nificant number of wins. tween the Oracle and DS schemes. To that end, a new pool gen-
We therefore believe that this point should be further investi- eration method was proposed, which guarantees that the Oracle
gated in order to understand why and when DS methods that are performance is always 100% for the training dataset. The analysis
based on K-NN present a significant boost in classification accu- showed that even though the Oracle for the training dataset is al-
racy even when the same neighborhood size is considered in both ways 100%, the performance of the DCS techniques for the train-
approaches. ing set (in memorization) had about an 85% accuracy. That means
that although the presence of the correct classifier is guaranteed
7. Conclusion and perspectives in the pool, the DCS techniques struggled to select correct classi-
fier deemed by the Oracle.
In this paper, we presented an updated taxonomy of dynamic This difficulty is characterized by the notable difference in ac-
selection systems. The key points of a dynamic selection systems curacy rates between the Oracle and the DCS techniques reported
are analyzed: 1) the methodology used for the definition of the re- in several works [26,27,101], which suggests that the Oracle, as in-
gion of competence used to estimate the local competences of the tuitive as it may be, is not the best guide for measuring the perfor-
base classifiers; 2) the source of information used to estimate the mance of a given classifier pool for dynamic selection. The reason
competence of the base classifiers, and 3) The selection approach for this is that the Oracle perceives the classification problem glob-
used to determine whether a single base classifier or an ensemble ally, whereas DS techniques rely on local information to select the
of classifiers is selected. Then, we present the state-of-the-art dy- best classifier for each test instance. Thus, the Oracle information
212 R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216

may not be very much relevant for dynamic selection schemes. whether or not the query sample is located in region with border-
Based on this analysis, the authors propose a new metric, called line samples of different classes (called indecision region). In the
the Hit-rate, which takes into account the local information from context of DS, a sample is located in an indecision region when
the DS methods. They argue that the Hit-rate should be used in- its region of competence contains instances from different classes.
stead of the Oracle, as it covers both local and global information The authors demonstrated that in such cases, many DS techniques
regarding the given pool of classifiers. can select classifiers with decision boundaries that do not cross
the region of competence, assigning all samples in the region of
competence to the same class. This may cause problems specially
7.2. Pool generation
when dealing with imbalanced datasets, in which the majority of
the samples in the region of competence belong to a single class
In the majority of DS publications, the pool of classifiers is
(majority class).
generated using either well known ensemble generation methods
In order to deal with this problem, an online pruning frame-
such as Bagging, or by using heterogeneous classifiers [25,34]. The
work was proposed in [109], which pre-selects classifiers with de-
problem with such generation approaches is that they were pro-
cision boundaries that crosses the region of competence of the test
posed for static combination methods. In other words, they use a
instance, when the test instance is located in an indecision region.
global approach in generating the base classifiers. Since these tech-
The first step of the framework is to determine whether or not
niques look at the problem globally rather than locally, they do
query instance is located in an indecision region. If yes, the prun-
not guarantee the presence of local experts. For this reason, the
ing mechanism is employed to pre-select the classifiers that cross
DS methods may not be able to select the competent classifiers lo-
the region of competence. Then, a DS technique is applied to se-
cally [166]. To the best of our knowledge, there is no classifier gen-
lect the most competent classifiers among the pre-selected pool. If
eration procedure that is adapted to dynamic selection techniques.
the query is located in a safe region, the DS technique is used for
We believe that the definition of a classifier generation procedure
classification. The proposed online pruning framework significantly
that takes into account the local information is a very promising
improved the classification accuracy of the 9 DS techniques consid-
research direction in order to improve the performance of all DS
ered in the experimental analysis. This result demonstrates that in
techniques.
order to be considered competent, a base classifier should not only
Another research direction in terms of the generation of a pool
obtain a high competence level, estimated by the DS method, but
of classifiers concerns the pool size. Normally, a large pool of clas-
must also cross the region of competence in the case of indecision
sifiers is considered when DS techniques are evaluated. For in-
regions.
stance, in [27,123,167] a pool composed of 100 base classifiers was
However, one of the problems pointed out in this work is the
used, while in [156], a pool composed of 10 0 0 SVMs was em-
fact that some samples were located in an indecision region, and
ployed. An interesting work conducted by Roy et al. [165,168] pro-
had no base classifiers with decision boundaries crossing its region
posed a meta-regression model to predict the best size of the
of competence, which lead us to think that an ensemble generation
pool of classifiers based on complexity measures extracted from
technique that maximizes the number of base classifiers with deci-
the classification problem. Results demonstrate that DS usually
sion boundaries crossing the indecision regions as a very promising
presents better classification results using an average 20 base clas-
research direction. That also supports our hypothesis that, when
sifiers. Hence, using smaller pools of classifiers can not only im-
working with DS, we need an ensemble generation method that
prove the accuracy, but also, reduce the computational costs in-
generates local experts rather than global ones.
volved in DS techniques.

7.3. Region of competence deﬁnition 7.4. Prototype selection and generation for DS

As reported in [108] and [107], the definition of the region of The rationale for using PS techniques is that the performance of
competence plays a very important role in the classification perfor- DS techniques is very dependent on the distribution of DSEL. When
mance of a DS system since the local competence of the base clas- the samples in this set are not representative enough of the query
sifiers is estimated based on the samples belonging to this region. sample, the DS technique may not select the most competent clas-
Hence, we believe an interesting research direction is to study the sifiers to predict its label. This phenomenon may occur as a result
relationship between the samples belonging to the region of com- of a high degree of overlap between different classes or may be
petence and the selection of the base classifiers; in other words, due to the presence of noise [108]. Another important aspect of
how the distribution of the region of competence changes the way editing the distribution of DSEL is that it can also significantly re-
the competence of the base classifiers is estimated. This relation- duce the computational complexity involved in applying dynamic
ship can be used in defining new ways of demarcating the region selection techniques, since the definition of the region of compe-
of competence and selecting classifiers, and additionally should be tence is conducted using the K-NN technique, which can be very
taken into account during the classifier generation procedure. costly when dealing with large datasets.
Furthermore, the majority of the dynamic selection techniques Recent works have pointed out that the use of Prototype Selec-
use a fixed neighborhood size. This value of K is often used for tion (PS) [169] techniques can significantly improve the classifica-
multiple classification problems, regardless of their complexity. An- tion accuracy of several dynamic selection techniques [104,107]. In
other interesting future work would involve the prediction of the this case, the PS techniques are applied to edit the distribution of

best K value according to the specificity of the complexity of DSEL in the training stage. The edited DSEL, denoted by DSEL , is
the classification problem, or, having a variable neighborhood size then used for extracting the regions of competence during the gen-
which could also change dynamically, based on the location of the eralization phase. In [104], we evaluated the impact of six Proto-
query sample in the feature space. In this case, one could for in- type Selection techniques in relation to the classification accuracy
stance, have a higher K value for samples that are closer to the and reduction of computational time of several dynamic selection
decision border in order to reduce the influence of noisy samples. techniques. The experimental analysis demonstrated that PS tech-
Another interesting aspect regarding the region of competence niques, such as the Relative Neighborhood Graph, significantly im-
is the work conducted by Oliveira et al. [109], in which the authors prove the classification performance of all DS methods considered
studied the estimated regions of competence in order to know in the analysis. Moreover, it can also significantly reduce the size
R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216 213

of DSEL and improve the recognition performance of several DS 7.6. Cost-sensitive dynamic selection
techniques.
However, as reported in [104], the techniques that present the In many real-world classification tasks, such as medical diagno-
best results for DS are not the ones that obtained the best clas- sis and credit analysis, it is crucial to take into account misclas-
sification accuracy when the 1-NN is considered as the classifi- sification costs associated with each class. Furthermore, different
cation scheme. For instance, the Generational Genetic Algorithm classification costs can stem from the imbalanced nature of some
(GGA) and the CHC Adaptive Search Algorithm obtained the worse classification problems. In other words, due to the imbalanced na-
performance in the experimental study with DS techniques; how- ture of the problem, the system may require different costs for the
ever, they were among the top 5 best performing algorithms when majority and minority classes.
the 1-NN was considered as the classification scheme [169]. The Recently, several cost-sensitive ensemble approaches have been
performance obtained by the CHC method was significantly worse proposed, such as cost-sensitive trees ensemble [173], boost-
when compared to the other PS techniques, as well as the baseline ing [174] and the cost-sensitive ensemble methods based on the
result (i.e., the system without using PS). ROC space [175,176]. However, these approaches are all based on
This may be attributable to the fact that these techniques use static ensembles. To date the dynamic selection literature consid-
the classification accuracy of the 1-NN technique in the fitness ers all classes with the same cost, and no rejection mechanism has
function for editing the dataset. An interesting direction for future been proposed for DS [24]. Classification systems that deal with
work would involve the use of the performance of DS techniques, such applications often require a built-in rejection mechanism to
(e.g., the accuracy of the OLA technique) as a criterion within those avoid committing errors in very risky predictions. Thus, we believe
techniques in order to adapt the distribution of DSEL for the use of that a definition of cost-sensitive DCS and DES techniques is an-
DS techniques, rather than the 1-NN classifier. Moreover, the use other promising research direction for dynamic selection.
of Prototype Generation techniques [170] should also be evaluated
for improving the performance of DS techniques, especially when 7.7. Imbalanced datasets
dealing with imbalanced distributions [107].
Imbalanced learning has recently attracted much attention from
the pattern recognition community since this kind of data is very
common in real-world applications, e.g., biomedical data and spam
detection. Classification in the presence of class imbalance is chal-
lenging since the usual method of training and selecting standard
7.5. Diversity for DS classification models is based on classification accuracy. However,
if we take into account the classification accuracy in such cases,
One of the most studied aspects in MCS is the concept of diver- the minority class could be totally ignored.
sity. It is known that we need diversity in the classifier ensemble, In our opinion, dynamic selection techniques can bring many
since combining classifiers that always produce the same decision benefits to this type of problem since they perform a local classi-
will not improve the recognition rate of the system. Diversity is of- fication. The selection of the ensemble of classifiers is performed
ten measured by the difference in the classifiers’ decision, such as taking into account only the neighborhood of the query sample,
in the Q-statistics and in the disagreement measures [131]. rather than the whole dataset. Thus, we believe that the classifier
To date, few dynamic selection techniques have utilized diver- selection scheme will not be biased towards the majority class.
sity together with different competence measures to perform en- To the best of our knowledge, there is just one publication that
semble selection [30,171]. However, the two DS techniques that discusses the use of dynamic selection for imbalanced distribu-
take into account diversity information did not present a good tions [126]. However, the proposed system was only applied to two
overall performance in our experimental analysis (as shown in credit scoring datasets, which is not enough to evaluate whether or
Section 6). not DS can cope with class imbalance. In addition, this paper did
In our opinion, when dynamic selection is considered, a diverse not take into consideration the use of data preprocessing such as
pool of classifiers is required, since, intuitively, a pool of diverse SMOTE and RAMO [177]. Hence, another interesting future work
classifiers means that there are several classifiers specialized in would involve the evaluation of DS techniques for imbalanced dis-
different regions of the feature space. Consequently, the classifier tributions, and possibly, the definition or adaptation of DS tech-
pool has a better coverage of the whole feature space [172]. How- niques for this kind of application.
ever, after selecting the EoC, we believe that we need to promote
consensus, rather than diversity, among the selected classifiers. In-
Acknowledgment
tuitively, the non-competent classifiers, for the corresponding local
region, will present high diversity when compared to the compe-
This work was supported by the Natural Sciences and Engineer-
tent ones, since they performed differently at the local level. The
ing Research Council of Canada (NSERC), the École de technologie
addition of diverse classifiers at the ensemble level can hinder the
supérieure (ÉTS Montréal) and CNPq (Conselho Nacional de Desen-
EoC decision. Moreover, if there is no consensus among the se-
volvimento Científico e Tecnológico).
lected base classifiers, the system may end up randomly selecting
the class of the query instance. This point is discussed by Sağlam
References
and Street in a recent publication [156], who evaluate the concept
of distant diversity.
[1] L.I. Kuncheva, A theoretical study on six classifier fusion strategies, IEEE Trans.
Another important point to be investigated is the impact of di- Pattern Anal. Mach. Intell. 24 (2) (2002) 281–286.
versity for dynamic ensembles, rather than static ones. The analy- [2] T.G. Dietterich, Ensemble methods in machine learning, in: International
Workshop on Multiple Classifier Systems, Springer, 20 0 0, pp. 1–15.
sis conducted in [70,131] considered only static combination rules
[3] L.I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms, Wi-
(e.g., Average, Product and Majority voting). In the case of DS tech- ley-Interscience, 2004.
niques, the impact of diversity could be analyzed at the pool level, [4] M. Fernández-Delgado, E. Cernadas, S. Barro, D. Amorim, Do we need hun-
i.e., before selecting the base classifiers, as well as at the instance dreds of classifiers to solve real world classification problems? J. Mach. Learn.
Res. 15 (2014) 3133–3181.
level, i.e., after the base classifiers are selected for the classification [5] D. Opitz, R. Maclin, Popular ensemble methods: an empirical study, J. Artif.
of the query. Intell. Res. 11 (1999) 169–198.
214 R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216

[6] R. Polikar, Ensemble based systems in decision making, IEEE Circuits Syst. [37] A. Tsymbal, M. Pechenizkiy, P. Cunningham, S. Puuronen, Dynamic integration
Mag. 6 (3) (2006) 21–45. of classifiers for handling concept drift, Inf. Fus. 9 (1) (2008) 56–68.
[7] S. Bashbaghi, E. Granger, R. Sabourin, G. Bilodeau, Dynamic selection of exem- [38] W. Qu, Y. Zhang, J. Zhu, Q. Qiu, Mining multi-label concept-drifting data
plar-svms for watch-list screening through domain adaptation, in: Proceed- streams using dynamic classifier ensemble, in: Asian Conference on Machine
ings of the 6th International Conference on Pattern Recognition Applications Learning, Springer, 2009, pp. 308–321.
and Methods (ICPRAM), 2017, pp. 738–745. [39] I. Mendialdua, J. Martínez-Otzeta, I. Rodriguez-Rodriguez, T. Ruiz-Vazquez,
[8] P.R.L. de Almeida, E.J. da Silva Júnior, T.M. Celinski, A. de Souza Britto, L.E.S. de B. Sierra, Dynamic selection of the best base classifier in one versus one,
Oliveira, A.L. Koerich, Music genre classification using dynamic selection of Knowl. Based Syst. 85 (2015) 298–306.
ensemble of classifiers, in: Systems, Man, and Cybernetics (SMC), 2012 IEEE [40] M. Galar, A. Fernández, E. Barrenechea, H. Bustince, F. Herrera, Dynamic clas-
International Conference on, IEEE, 2012, pp. 2700–2705. sifier selection for one-vs-one strategy: avoiding non-competent classifiers,
[9] S. Lessmann, B. Baesens, H.-V. Seow, L.C. Thomas, Benchmarking Pattern Recognit. 46 (12) (2013) 3412–3424.
state-of-the-art classification algorithms for credit scoring: an update of [41] Z.-L. Zhang, X.-G. Luo, S. García, J.-F. Tang, F. Herrera, Exploring the effective-
research, Eur. J. Oper. Res. 247 (1) (2015) 124–136. ness of dynamic ensemble selection in the one-versus-one scheme, Knowl.
[10] H. Xiao, Z. Xiao, Y. Wang, Ensemble classification based on supervised clus- Based Syst. 125 (2017) 53–63.
tering for credit scoring, Appl. Soft Comput. 43 (2016) 73–86. [42] L. Batista, E. Granger, R. Sabourin, Dynamic selection of generative–discrim-
[11] M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, F. Herrera, A review inative ensembles for off-line signature verification, Pattern Recognit. 45 (4)
on ensembles for the class imbalance problem: bagging-, boosting-, and hy- (2012) 1326–1340.
brid-based approaches, IEEE Trans. Syst. Man. Cybern. Part C 42 (4) (2012) [43] S. Bashbaghi, E. Granger, R. Sabourin, G.-A. Bilodeau, Robust watch-list
463–484. screening using dynamic ensembles of svms based on multiple face repre-
[12] C. Porcel, A. Tejeda-Lorente, M. Martínez, E. Herrera-Viedma, A hybrid rec- sentations, Mach. Vis. Appl. (2017) 1–23.
ommender system for the selective dissemination of research resources in a [44] S. Bashbaghi, E. Granger, R. Sabourin, G.-A. Bilodeau, Dynamic ensembles of
technology transfer office, Inf. Sci. 184 (1) (2012) 1–19. exemplar-svms for still-to-video face recognition, Pattern Recognit 69 (2017)
[13] M. Jahrer, A. Töscher, R. Legenstein, Combining predictions for accurate 61–81.
recommender systems, in: Proceedings of the 16th ACM SIGKDD Interna- [45] C. Pagano, Adaptive classifier ensembles for face recognition in video-surveil-
tional Conference on Knowledge Discovery and Data Mining, ACM, 2010, lance, Ph.D. thesis, École de technologie supérieure, 2015.
pp. 693–702. [46] D. Ruta, B. Gabrys, Classifier selection for majority voting, Inf. Fus. 6 (1)
[14] D. Di Nucci, F. Palomba, R. Oliveto, A. De Lucia, Dynamic selection of classi- (2005) 63–81.
fiers in bug prediction: an adaptive method, IEEE Trans. Emerg. Topics Com- [47] R.P.W. Duin, The combining classifier: to train or not to train?, Proceedings of
put. Intell. 1 (3) (2017) 202–212. the 16th International Conference on Pattern Recognition 2 (2002) 765–770.
[15] A. Panichella, R. Oliveto, A. De Lucia, Cross-project defect prediction models: [48] L.K. Hansen, P. Salamon, Neural network ensembles, IEEE Trans. Pattern Anal.
L’union fait la force, in: Software Maintenance, Reengineering and Reverse Mach. Intell. 12 (10) (1990) 993–1001.
Engineering (CSMR-WCRE), 2014 Software Evolution Week-IEEE Conference [49] S.-B. Cho, J.H. Kim, Combining multiple neural networks by fuzzy integral for
on, IEEE, 2014, pp. 164–173. robust classification, IEEE Trans. Syst. Man Cybern. 25 (2) (1995) 380–384.
[16] G. Giacinto, F. Roli, L. Didaci, Fusion of multiple classifiers for intrusion detec- [50] L. Breiman, Bagging predictors, Mach. Learn. 24 (1996) 123–140.
tion in computer networks, Pattern Recognit. Lett. 24 (12) (2003) 1795–1803. [51] M. Skurichina, R.P.W. Duin, Bagging for linear classifiers, Pattern Recognit. 31
[17] G. Giacinto, R. Perdisci, M. Del Rio, F. Roli, Intrusion detection in computer (1998) 909–930.
networks by a modular ensemble of one-class classifiers, Inf. Fus. 9 (1) (2008) [52] Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learn-
69–82. ing and an application to boosting, in: Proceedings of the Second European
[18] B. Krawczyk, L.L. Minku, J. Gama, J. Stefanowski, M. Woźniak, Ensemble learn- Conference on Computational Learning Theory, 1995, pp. 23–37.
ing for data stream analysis: a survey, Inf. Fus. 37 (2017) 132–156. [53] J. Feng, L. Wang, M. Sugiyama, C. Yang, Z.-H. Zhou, C. Zhang, Boosting and
[19] L.I. Kuncheva, Classifier ensembles for changing environments (2004) 1–15. margin theory, Front. Electr. Electron. Eng. 7 (1) (2012) 127–133.
[20] R. Polikar, L. Udpa, S. Udpa, S. Member, S. Member, V. Honavar, Learn++: an [54] A. Rahman, B. Verma, Novel layered clustering-based approach for generating
incremental learning algorithm for supervised neural networks, IEEE Trans. ensemble of classifiers, IEEE Trans. Neural Netw. 22 (5) (2011) 781–792.
Syst. Man Cybern. (C), Special Issue on Knowledge Management 31 (2001) [55] R.M. O. Cruz, G.D. C. Cavalcanti, T.I. Ren, An ensemble classifier for offline cur-
497–508. sive character recognition using multiple feature extraction techniques, Pro-
[21] M. Wozniak, M. Graña, E. Corchado, A survey of multiple classifier systems as ceedings of the International Joint Conference on Neural Networks (2010a)
hybrid systems, Inf. Fus. 16 (2014) 3–17. 744–751.
[22] L. Rokach, Ensemble-based classifiers, Artif. Intell. Rev. 33 (1) (2010) 1–39. [56] R.M. O. Cruz, G.D. C. Cavalcanti, T.I. Ren, Handwritten digit recognition us-
[23] Y. Ren, L. Zhang, P.N. Suganthan, Ensemble classification and regression-recent ing multiple feature extraction techniques and classifier ensemble, in: 17th
developments, applications and future directions, IEEE Comput. Intell. Mag. 11 International Conference on Systems, Signals and Image Processing, 2010,
(1) (2016) 41–53. pp. 215–218.
[24] A.S. Britto, R. Sabourin, L.E.S. de Oliveira, Dynamic selection of classifiers - a [57] A. Rahman, B. Verma, Effect of ensemble classifier composition on offline cur-
comprehensive review, Pattern Recognit. 47 (11) (2014) 3665–3680. sive character recognition, Inf. Process. Manage. 49 (4) (2013) 852–864.
[25] T. Woloszynski, M. Kurzynski, A probabilistic model of classifier competence [58] A. Schindler, R. Mayer, A. Rauber, Facilitating comprehensive benchmarking
for dynamic ensemble selection, Pattern Recognit. 44 (2011) 2656–2668. experiments on the million song dataset., in: ISMIR, 2012, pp. 469–474.
[26] A.H.R. Ko, R. Sabourin, u.S. Britto Jr., From dynamic classifier selection to dy- [59] T.K. Ho, The random subspace method for constructing decision forests, IEEE
namic ensemble selection, Pattern Recognit. 41 (2008) 1735–1748. Trans. Pattern Anal. Mach. Intell. 20 (1998) 832–844.
[27] R.M.O. Cruz, R. Sabourin, G.D.C. Cavalcanti, T.I. Ren, META-DES: a dynamic [60] M. Skurichina, R.P.W. Duin, Bagging, boosting and the random subspace
ensemble selection framework using meta-learning, Pattern Recognit. 48 (5) method for linear classifiers, Pattern Anal. Appl. 5 (2) (2002) 121–135.
(2015) 1925–1935. [61] R.P. Duin, D.M. Tax, Experiments with classifier combining rules, in: Interna-
[28] X. Zhu, X. Wu, Y. Yang, Dynamic classifier selection for effective mining from tional Workshop on Multiple Classifier Systems, 20 0 0, pp. 16–29.
noisy data streams, in: Proceedings of the 4th IEEE International Conference [62] W. Wang, P. Jones, D. Partridge, Diversity between neural networks and deci-
on Data Mining, 2004, pp. 305–312. sion trees for building multiple classifier systems, in: International Workshop
[29] L.I. Kuncheva, Clustering-and-selection model for classifier combination, in: on Multiple Classifier Systems, 20 0 0, pp. 240–249.
Fourth International Conference on Knowledge-Based Intelligent Information [63] L. Breiman, Random forests, Mach. Learn. 45 (1) (2001) 5–32.
Engineering Systems & Allied Technologies, 20 0 0, pp. 185–188. [64] L. Rokach, Decision forest: twenty years of research, Inf. Fus. 27 (2016)
[30] R.G.F. Soares, A. Santana, A.M.P. Canuto, M.C.P. de Souto, Using accuracy and 111–125.
diversity to select classifiers to build ensembles, in: Proceedings of the Inter- [65] J.J. Rodríguez, L.I. Kuncheva, C.J. Alonso, Rotation forest: a new classifier
national Joint Conference on Neural Networks, 2006, pp. 1310–1316. ensemble method, IEEE Trans. Pattern Anal. Mach. Intell. 28 (10) (2006)
[31] K. Woods, W.P. Kegelmeyer Jr., K. Bowyer, Combination of multiple classi- 1619–1630.
fiers using local accuracy estimates, IEEE Trans. Pattern Anal. Mach. Intell. 19 [66] G. Giacinto, F. Roli, Design of effective neural network ensembles for image
(1997) 405–410. classification purposes, Image Vis. Comput. 19 (9–10) (2001) 699–707.
[32] P.C. Smits, Multiple classifier systems for supervised remote sensing image [67] G. Giacinto, F. Roli, Design of effective neural network ensembles for image
classification based on dynamic classifier selection, IEEE Trans. Geosci. Re- classification purposes, Image Vis. Comput. 19 (9–10) (2001) 699–707.
mote Sens. 40 (4) (2002) 801–813. [68] M. Aksela, Comparison of classifier selection methods for improving commit-
[33] M. Sabourin, A. Mitiche, D. Thomas, G. Nagy, Classifier combination for hand- tee performance, in: International Workshop on Multiple Classifier Systems,
printed digit recognition, International Conference on Document Analysis and 2003, pp. 84–93.
Recognition (1993) 163–166. [69] R.M.O. Cruz, G.D.C. Cavalcanti, I.R. Tsang, R. Sabourin, Feature representation
[34] T. Woloszynski, M. Kurzynski, P. Podsiadlo, G.W. Stachowiak, A measure of selection based on classifier projection space and oracle analysis, Expert Syst.
competence based on random classification for dynamic ensemble selection, Appl. 40 (9) (2013) 3813–3827.
Inf. Fus. 13 (3) (2012) 207–213. [70] G. Brown, L.I. Kuncheva, “Good” and “bad” diversity in majority vote en-
[35] B. Krawczyk, M. Wozniak, Dynamic classifier selection for one-class classifi- sembles, in: International Workshop on Multiple Classifier Systems, Springer,
cation, Knowl. Based Syst. 107 (2016) 43–53. 2010, pp. 124–133.
[36] P.R.L. de Almeida, L.S. Oliveira, A. de Souza Britto Jr, R. Sabourin, Handling [71] E.M. dos Santos, R. Sabourin, P. Maupin, Overfitting cautious selection of clas-
concept drifts using dynamic selection of classifiers, in: Tools with Artificial sifier ensembles with genetic algorithms, Inf. Fus. 10 (2) (2009) 150–162.
Intelligence (ICTAI), 2016, pp. 989–995. [72] I. Partalas, G. Tsoumakas, I. Vlahavas, Focused ensemble selection: a diversi-
R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216 215

ty-based method for greedy ensemble selection, in: Proceeding of the 18th [106] P.R. Cavalin, R. Sabourin, C.Y. Suen, Dynamic selection approaches for multiple
European Conference on Artificial Intelligence, 2008, pp. 117–121. classifier systems, Neural Comput. Appl. 22 (3–4) (2013) 673–688.
[73] E.M. dos Santos, R. Sabourin, Classifier ensembles optimization guided by [107] R.M.O. Cruz, R. Sabourin, G.D. Cavalcanti, Prototype selection for dynamic
population oracle, in: IEEE Congress on Evolutionary Computation, 2011, classifier and ensemble selection, Neural Comput. Appl. (2016) 1–11.
pp. 693–698. [108] R.M.O. Cruz, R. Sabourin, G.D.C. Cavalcanti, A DEEP analysis of the META-DES
[74] B. Gabrys, D. Ruta, Genetic algorithms in classifier fusion, Appl. Soft Comput. framework for dynamic selection of ensemble of classifiers, CoRR (2015).
6 (4) (2006) 337–347. abs/1509.00825.
[75] Z.-H. Zhou, J. Wu, W. Tang, Ensembling neural networks: many could be bet- [109] D.V. Oliveira, G.D. Cavalcanti, R. Sabourin, Online pruning of base classi-
ter than all, Artif. Intell. 137 (1–2) (2002) 239–263. fiers for dynamic ensemble selection, Pattern Recognit. (2017), doi:10.1016/
[76] R.E. Banfield, L.O. Hall, K.W. Bowyer, W.P. Kegelmeyer, Ensemble diversity j.patcog.2017.06.030.
measures and their application to thinning, Inf. Fus. 6 (1) (2005) 49–62. [110] T.P.F. de Lima, A.T. Sergio, T.B. Ludermir, Improving classifiers and regions of
[77] L. Kuncheva, Fuzzy Classifier Design, vol. 49, Springer Science & Business Me- competence in dynamic ensemble selection, in: Intelligent Systems (BRACIS),
dia, 20 0 0. 2014 Brazilian Conference on, IEEE, 2014, pp. 13–18.
[78] R.P. Duin, D.M. Tax, Classifier conditional posterior probabilities, in: Joint [111] T.P.F. De Lima, T.B. Ludermir, Optimizing dynamic ensemble selection proce-
IAPR International Workshops on Statistical Techniques in Pattern Recogni- dure by evolutionary extreme learning machines and a noise reduction filter,
tion (SPR) and Structural and Syntactic Pattern Recognition (SSPR), Springer, in: Tools with Artificial Intelligence (ICTAI), 2013 IEEE 25th International Con-
1998, pp. 611–619. ference on, IEEE, 2013, pp. 546–552.
[79] J. Kittler, M. Hatef, R.P.W. Duin, J. Matas, On combining classifiers, IEEE Trans. [112] G. Giacinto, F. Roli, Dynamic classifier selection based on multiple classifier
Pattern Anal. Mach. Intell. 20 (1998) 226–239. behaviour, Pattern Recognit. 34 (2001) 1879–1881.
[80] T.K. Ho, J.J. Hull, S.N. Srihari, Decision combination in multiple classifier sys- [113] P.R. Cavalin, R. Sabourin, C.Y. Suen, Logid: an adaptive framework combining
tems, IEEE Trans. Pattern Anal. Mach. Intell. 16 (1) (1994) 66–75. local and global incremental learning for dynamic selection of ensembles of
[81] Y.S. Huang, C.Y. Suen, A method of combining multiple experts for the HMMs, Pattern Recognit. 45 (9) (2012) 3544–3556.
recognition of unconstrained handwritten numerals, IEEE Trans. Pattern Anal. [114] L. Rastrigin, R. Erenstein, Method of Collective Recognition, vol. 595, 1981. (in
Mach. Intell. 17 (1995) 90–94. Russian).
[82] L.I. Kuncheva, J.C. Bezdek, R.P.W. Duin, Decision templates for multiple [115] C. Lin, W. Chen, C. Qiu, Y. Wu, S. Krishnan, Q. Zou, Libd3c: ensemble classifiers
classifier fusion: an experimental comparison, Pattern Recognit. 34 (2001) with a clustering and dynamic selection strategy, Neurocomputing 123 (2014)
299–314. 424–435.
[83] Y. Lu, Knowledge integration in a multiple classifier system, Appl. Intell. 6 (2) [116] M.C. de Souto, R.G. Soares, A. Santana, A.M. Canuto, Empirical comparison
(1996) 75–86. of dynamic classifier selection methods based on diversity and accuracy
[84] G.L. Rogova, Combining the results of several neural network classifiers, Neu- for building ensembles, in: Neural Networks, 2008. IJCNN 2008.(IEEE World
ral Netw. 7 (5) (1994) 777–781. Congress on Computational Intelligence). IEEE International Joint Conference
[85] D.M.J. Tax, M. van Breukelen, R.P.W. Duin, J. Kittler, Combining multiple on, IEEE, 2008, pp. 1480–1487.
classifiers by averaging or by multiplying? Pattern Recognit. 33 (9) (20 0 0) [117] J. Wang, P. Neskovic, L.N. Cooper, Improving nearest neighbor rule with a sim-
1475–1485. ple adaptive distance measure, Pattern Recognit. Lett. 28 (2007) 207–213.
[86] A.K. Jain, R.P.W. Duin, J. Mao, Statistical pattern recognition: a review, IEEE [118] J. Wang, P. Neskovic, L.N. Cooper, Neighborhood size selection in the k-n-
Trans. Pattern Anal. Mach. Intell. 22 (1) (20 0 0) 4–37. earest-neighbor rule using statistical confidence, Pattern Recognit. 39 (2006)
[87] L. Lam, C.Y. Suen, Optimal combinations of pattern classifiers, Pattern Recog- 417–423.
nit. Lett. 16 (9) (1995) 945–954. [119] B. Sierra, E. Lazkano, I. Irigoien, E. Jauregi, I. Mendialdua, K nearest neighbor
[88] D.H. Wolpert, Stacked generalization, Neural Netw. 5 (1992) 241–259. equality: giving equal chance to all existing classes, Inf. Sci. 181 (23) (2011)
[89] Š. Raudys, Trainable fusion rules. ii. small sample-size effects, Neural Netw. 5158–5168.
19 (10) (2006) 1517–1527. [120] T. Woloszynski, M. Kurzynski, On a new measure of classifier competence ap-
[90] Š. Raudys, Trainable fusion rules. i. large sample size case, Neural Netw. 19 plied to the design of multiclassifier systems, in: International Conference on
(10) (2006) 1506–1516. Image Analysis and Processing (ICIAP), 20 09, pp. 995–10 04.
[91] R.A. Jacobs, M.I. Jordan, S.J. Nowlan, G.E. Hinton, Adaptive mixtures of local [121] L. Batista, E. Granger, R. Sabourin, Dynamic ensemble selection for off-line
experts, Neural Comput 3 (1991) 79–87. signature verification, in: International Workshop on Multiple Classifier Sys-
[92] S. Masoudnia, R. Ebrahimpour, Mixture of experts: a literature survey, Artif. tems, 2011, pp. 157–166.
Intell. Rev. (2014) 1–19. [122] K. M., W. T., R. Lysiak, On two measures of classifier competence for dy-
[93] S.E. Yuksel, J.N. Wilson, P.D. Gader, Twenty years of mixture of experts, IEEE namic ensemble selection - experimental comparative analysis, in: Interna-
Trans. Neural Netw. Learn. Syst. 23 (8) (2012) 1177–1193. tional Symposium on Communications and Information Technologies, 2010,
[94] H. Cevikalp, R. Polikar, Local classifier weighting by quadratic programming, pp. 1108–1113.
IEEE Trans. Neural Netw. 19 (10) (2008) 1832–1838. [123] A.L. Brun, A.S.B. Jr., L.S. Oliveira, F. Enembreck, R. Sabourin, Contribution of
[95] D. Jiménez, Dynamically weighted ensemble neural networks for classifica- data complexity features on dynamic classifier selection, in: International
tion, in: Neural Networks Proceedings, 1998. IEEE World Congress on Com- Joint Conference on Neural Networks (IJCNN), 2016, pp. 4396–4403.
putational Intelligence. The 1998 IEEE International Joint Conference on, 1, [124] R.M.O. Cruz, R. Sabourin, G.D.C. Cavalcanti, On meta-learning for dynamic en-
IEEE, 1998, pp. 753–756. semble selection, in: 22nd International Conference on Pattern Recognition
[96] D. Štefka, M. Holeňa, Dynamic classifier aggregation using interaction-sensi- (ICPR), 2014, pp. 1230–1235.
tive fuzzy measures, Fuzzy Sets Syst. 270 (2015) 25–52. [125] F. Pinto, C. Soares, J. Mendes-Moreira, Chade: metalearning with classifier
[97] R.M.O. Cruz, R. Sabourin, G.D.C. Cavalcanti, META-DES.H: a dynamic ensemble chains for dynamic combination of classifiers, in: Joint European Conference
selection technique using meta-learning and a dynamic weighting approach, on Machine Learning and Knowledge Discovery in Databases, Springer, 2016,
in: International Joint Conference on Neural Networks, 2015, pp. 1–8. pp. 410–425.
[98] L.M. Vriesmann, A.S. Britto, L.S. Oliveira, A.L. Koerich, R. Sabourin, Combining [126] J. Xiao, L. Xie, C. He, X. Jiang, Dynamic classifier ensemble model for customer
overall and local class accuracies in an oracle-based method for dynamic en- classification with imbalanced class distribution, Expert Syst. Appl. 39 (2012)
semble selection, in: Neural Networks (IJCNN), 2015 International Joint Con- 3668–3675.
ference on, IEEE, 2015, pp. 1–7. [127] E.M. Dos Santos, R. Sabourin, P. Maupin, A dynamic overproduce-and-choose
[99] R.M.O. Cruz, R. Sabourin, G.D.C. Cavalcanti, Meta-des. Oracle: meta-learning strategy for the selection of classifier ensembles, Pattern Recognit. 41 (2008)
and feature selection for dynamic ensemble selection, Inf. Fus. 38 (2017) 2993–3009.
84–103. [128] M. Wozniak, Hybrid Classifiers: Methods of Data, Knowledge, and Classifier
[100] M. Wozniak, M. Zmyslony, Designing fusers on the basis of discrimi- Combination, Springer, 2013.
nants–evolutionary and neural methods of training, in: International Confer- [129] G. Giacinto, F. Roli, Methods for dynamic classifier selection, in: Image Anal-
ence on Hybrid Artificial Intelligence Systems, Springer, 2010, pp. 590–597. ysis and Processing, 1999. Proceedings. International Conference on, IEEE,
[101] L. Didaci, G. Giacinto, F. Roli, G.L. Marcialis, A study on the performances 1999, pp. 659–664.
of dynamic classifier selection based on local accuracy estimation, Pattern [130] R.M.O. Cruz, Dynamic Selection of Ensemble of Classifiers Using Meta-Learn-
Recognit. 38 (11) (2005) 2188–2191. ing, Ph.D. thesis, École de Technologie Supérieure, 2016.
[102] R.M. O. Cruz, G.D. C. Cavalcanti, T.I. Ren, A method for dynamic ensemble [131] C.A. Shipp, L.I. Kuncheva, Relationships between combination methods and
selection based on a filter and an adaptive distance to improve the quality of measures of diversity in combining classifiers, Inf. Fus. 3 (2002) 135–148.
the regions of competence, Proceedings of the International Joint Conference [132] T.K. Ho, M. Basu, Complexity measures of supervised classification problems,
on Neural Networks (2011) 1126–1133. IEEE Trans. Pattern Anal. Mach. Intell. 24 (3) (2002) 289–300.
[103] L. Didaci, G. Giacinto, Dynamic classifier selection by adaptive k-near- [133] M. Galar, A. Fernández, E. Barrenechea, H. Bustince, F. Herrera, An overview
est-neighbourhood rule, in: International Workshop on Multiple Classifier of ensemble methods for binary classifiers in multi-class problems: experi-
Systems, Springer, 2004, pp. 174–183. mental study on one-vs-one and one-vs-all schemes, Pattern Recognit. 44 (8)
[104] R.M.O. Cruz, R. Sabourin, G.D.C. Cavalcanti, Analyzing different prototype se- (2011) 1761–1776.
lection techniques for dynamic classifier and ensemble selection, in: Interna- [134] M. Galar, A. Fernández, E. Barrenechea, F. Herrera, Drcw-ovo: distance-based
tional Joint Conference on Neural Networks (IJCNN), 2017, pp. 3959–3966. relative competence weighting combination for one-vs-one strategy in multi-
[105] T. Woloszynski, M. Kurzynski, A measure of competence based on randomized -class problems, Pattern Recognit. 48 (1) (2015) 28–42.
reference classifier for dynamic ensemble selection, in: International Confer- [135] D.M.J. Tax, One-class classification: Concept Learning in the Absence of Coun-
ence on Pattern Recognition (ICPR), 2010, pp. 4194–4197. ter-Examples, Ph.D. thesis, Technische Universiteit Delft, 2001.
216 R.M.O. Cruz et al. / Information Fusion 41 (2018) 195–216

[136] B. Antosik, M. Kurzynski, New measures of classifier competence – heuristics dures for redundant systems of hypotheses, in: Multiple Hypothesenprü-
and application to the design of multiple classifier systems., in: Computer fung/Multiple Hypotheses Testing, Springer, 1988, pp. 100–115.
Recognition Systems, vol. 4, 2011, pp. 197–206. [159] S. Garcia, F. Herrera, An extension on“statistical comparisons of classifiers
[137] J. Martins, L.S. Oliveira, A. Britto, R. Sabourin, Forest species recognition based over multiple data sets”for all pairwise comparisons, J. Mach. Learn. Res. 9
on dynamic classifier selection and dissimilarity feature vector representa- (Dec) (2008) 2677–2694.
tion, Mach Vis Appl 26 (2–3) (2015) 279–293. [160] J. Derrac, S. García, D. Molina, F. Herrera, A practical tutorial on the use of
[138] A.T. Sergio, T.P. de Lima, T.B. Ludermir, Dynamic selection of forecast combin- nonparametric statistical tests as a methodology for comparing evolutionary
ers, Neurocomputing 218 (2016) 37–50. and swarm intelligence algorithms, Swarm Evol. Comput. 1 (1) (2011) 3–18.
[139] M. Kurzynski, A. Wolczowski, Dynamic selection of classifiers ensemble ap- [161] J. Zhu, H. Zou, S. Rosset, T. Hastie, Multi-class adaboost, Stat. Interface 2 (3)
plied to the recognition of emg signal for the control of bioprosthetic hand, (2009) 349–360.
in: Control, Automation and Systems (ICCAS), 2011 11th International Confer- [162] S. Bernard, L. Heutte, S. Adam, Influence of hyperparameters on random forest
ence on, IEEE, 2011, pp. 382–386. accuracy, in: International Workshop on Multiple Classifier Systems, Springer,
[140] M. Kurzynski, M. Krysmann, P. Trajdos, A. Wolczowski, Multiclassifier system 2009, pp. 171–180.
with hybrid learning applied to the control of bioprosthetic hand, Comput. [163] S. Bernard, L. Heutte, S. Adam, Forest-rk: a new random forest induction
Biol. Med. 69 (2016) 286–297. method, in: International Conference on Intelligent Computing, Springer,
[141] M. Kurzynski, M. Krysmann, P. Trajdos, A. Wolczowski, Two-stage multiclas- 2008, pp. 430–437.
sifier system with correction of competence of base classifiers applied to the [164] M.T. Hagan, M.B. Menhaj, Training feedforward networks with the marquardt
control of bioprosthetic hand, in: Tools with Artificial Intelligence (ICTAI), algorithm, IEEE Trans. Neural Netw. 5 (6) (1994) 989–993.
2014 IEEE 26th International Conference on, IEEE, 2014, pp. 620–626. [165] A. Roy, R.M.O. Cruz, R. Sabourin, G.D. Cavalcanti, Meta-regression based pool
[142] D.J. Hand, W.E. Henley, Statistical classification methods in consumer credit size prediction scheme for dynamic selection of classifiers, in: 23rd Interna-
scoring: a review, J. R. Statist. Soc. 160 (3) (1997) 523–541. tional Conference on Pattern Recognition (ICPR), 2016, pp. 216–221.
[143] C.N. Silla Jr, A.L. Koerich, C.A. Kaestner, The latin music database., in: ISMIR, [166] M.A. Souza, G.D. Cavalcanti, R.M.O. Cruz, R. Sabourin, On the characterization
2008, pp. 451–456. of the oracle for dynamic classifier selection, in: International Joint Confer-
[144] J. Milgram, M. Cheriet, R. Sabourin, Estimating accurate multi-class proba- ence on Neural Networks (IJCNN), 2017, pp. 332–339.
bilities with support vector machines, in: Neural Networks, 2005. IJCNN’05. [167] E.M. dos Santos, R. Sabourin, P. Maupin, A dynamic overproduce-and-choose
Proceedings. 2005 IEEE International Joint Conference on, vol. 3, IEEE, 2005, strategy for the selection of classifier ensembles, Pattern Recognit. 41 (10)
pp. 1906–1911. (20 08) 2993–30 09.
[145] L.S. Oliveira, R. Sabourin, F. Bortolozzi, C.Y. Suen, Automatic recognition of [168] A. Roy, R.M.O. Cruz, R. Sabourin, G.D.C. Cavalcanti, Meta-learning recom-
handwritten numerical strings: a recognition and verification strategy, IEEE mendation of default size of classifier pool for META-DES, vol. 216, 2016,
Trans. Pattern Anal. Mach. Intell. 24 (11) (2002) 1438–1454. pp. 351–362.
[146] F. Vargas, M. Ferrer, C. Travieso, J. Alonso, Off-line handwritten signature [169] S. Garcia, J. Derrac, J. Cano, F. Herrera, Prototype selection for nearest neigh-
gpds-960 corpus, in: Document Analysis and Recognition, 2007. ICDAR 2007. bor classification: taxonomy and empirical study, IEEE Trans. Pattern Anal.
Ninth International Conference on, vol. 2, IEEE, 2007, pp. 764–768. Mach. Intell. 34 (3) (2012) 417–435.
[147] A. Wolczowski, M. Kurzynski, Human-machine interface in bioprosthesis con- [170] I. Triguero, J. Derrac, S. García, F. Herrera, A taxonomy and experimental study
trol using EMG signal classification, Expert Syst. 27 (1) (2010) 53–70. on prototype generation for nearest neighbor classification, IEEE Trans. Syst.
[148] K. Bache, M. Lichman, UCI Machine Learning Repository, 2013. Man Cybern. Part C 42 (1) (2012) 86–100.
[149] R.D. King, C. Feng, A. Sutherland, Statlog: Comparison of Classification Algo- [171] R. Lysiak, M. Kurzynski, T. Woloszynski, Probabilistic approach to the dynamic
rithms on Large Real-World Problems, 1995. ensemble selection using measures of competence and diversity of base clas-
[150] J. Alcalá-Fdez, A. Fernández, J. Luengo, J. Derrac, S. García, KEEL data-mining sifiers, in: International Conference on Hybrid Artificial Intelligence Systems,
software tool: data set repository, integration of algorithms and experimental 2011, pp. 229–236.
analysis framework, Mult. Val. Log. Soft Comput. 17 (2–3) (2011) 255–287. [172] T.K. Ho, Complexity of classification problems and comparative advantages of
[151] L. Kuncheva, Ludmila kuncheva collection LKC, 2004. combined classifiers, in: International Workshop on Multiple Classifier Sys-
[152] R.P.W. Duin, P. Juszczak, D. de Ridder, P. Paclik, E. Pekalska, D.M. Tax, Prtools, tems, Springer, 20 0 0, pp. 97–106.
a matlab toolbox for pattern recognition, 2004. [173] B. Krawczyk, M. Woźniak, G. Schaefer, Cost-sensitive decision tree ensembles
[153] S. Mirjalili, A. Lewis, S-shaped versus v-shaped transfer functions for binary for effective imbalanced classification, Appl. Soft Comput. 14 (2014) 554–562.
particle swarm optimization, Swarm Evol. Comput. 9 (2013) 1–14. [174] Y. Sun, M.S. Kamel, A.K. Wong, Y. Wang, Cost-sensitive boosting for classifica-
[154] M. Friedman, The use of ranks to avoid the assumption of normality implicit tion of imbalanced data, Pattern Recognit. 40 (12) (2007) 3358–3378.
in the analysis of variance, J. Am. Stat. Assoc. 32 (200) (1937) 675–701. [175] S. Bernard, C. Chatelain, S. Adam, R. Sabourin, The multiclass roc front method
[155] J. Demšar, Statistical comparisons of classifiers over multiple data sets, J. for cost-sensitive classification, Pattern Recognit. 52 (2016) 46–60.
Mach. Learn. Res. 7 (2006) 1–30. [176] C. Dubos, S. Bernard, S. Adam, R. Sabourin, Roc-based cost-sensitive classifica-
[156] Ş.Y. Sağlam, W.N. Street, Distant diversity in dynamic class prediction, Ann. tion with a reject option, in: International Conference on Pattern Recognition
Oper. Res. (2016) 1–15. (ICPR), 2016, pp. 3320–3325.
[157] A. Benavoli, G. Corani, F. Mangili, Should we really use post-hoc tests based [177] J. Díez-Pastor, J.J. Rodríguez, C. García-Osorio, L.I. Kuncheva, Diversity tech-
on mean-ranks, J. Mach. Learn. Res. 17 (5) (2016) 1–10. niques improve the performance of the best imbalance learning ensembles,
[158] B. Bergmann, G. Hommel, Improvements of general multiple test proce- Inf. Sci. 325 (2015) 98–117.

Using Accuracy and Diversity To Select Classifiers To Build Ensembles
No ratings yet
Using Accuracy and Diversity To Select Classifiers To Build Ensembles
7 pages
Pattern Recognition: Rafael M.O. Cruz, Robert Sabourin, George D.C. Cavalcanti, Tsang Ing Ren
No ratings yet
Pattern Recognition: Rafael M.O. Cruz, Robert Sabourin, George D.C. Cavalcanti, Tsang Ing Ren
11 pages
+ Main Research Streams On MCS - A Survey of Multiple Classifier Systems As Hybrid Systems 2014
No ratings yet
+ Main Research Streams On MCS - A Survey of Multiple Classifier Systems As Hybrid Systems 2014
15 pages
KRAWXZYKINFFUS2017
No ratings yet
KRAWXZYKINFFUS2017
86 pages
1.1 What Is Ensemble Learning: 1.2 Model Selection
No ratings yet
1.1 What Is Ensemble Learning: 1.2 Model Selection
18 pages
Cruz Ijcnn 2015
No ratings yet
Cruz Ijcnn 2015
8 pages
3.4 Ensemble Feature Selection For High-Dimensional Data A Stability Analysis Across Multiple Domains
No ratings yet
3.4 Ensemble Feature Selection For High-Dimensional Data A Stability Analysis Across Multiple Domains
23 pages
1 s2.0 S0167404816301572 Main
No ratings yet
1 s2.0 S0167404816301572 Main
18 pages
A Class-Based Feature Selection Method For Ensemble Systems
No ratings yet
A Class-Based Feature Selection Method For Ensemble Systems
6 pages
4 Classification
No ratings yet
4 Classification
20 pages
A Study of Some Data Mining Classification Techniques
No ratings yet
A Study of Some Data Mining Classification Techniques
4 pages
Classifier Selection For Majority Voting
No ratings yet
Classifier Selection For Majority Voting
19 pages
Chapter 4
No ratings yet
Chapter 4
31 pages
Literature Review CCSIT205
No ratings yet
Literature Review CCSIT205
9 pages
Da Silva Et Al. - 2021 - Automatic Recommendation Method For Classifier Ensemble Structure Using Meta-Learning
No ratings yet
Da Silva Et Al. - 2021 - Automatic Recommendation Method For Classifier Ensemble Structure Using Meta-Learning
15 pages
Clustering Before Classification
No ratings yet
Clustering Before Classification
3 pages
Toward Integrating Feature Selection Algorithms For Classification and Clustering-M7s PDF
No ratings yet
Toward Integrating Feature Selection Algorithms For Classification and Clustering-M7s PDF
12 pages
Hybrid Feature Selection
No ratings yet
Hybrid Feature Selection
8 pages
CH 4
No ratings yet
CH 4
21 pages
Ships Classification Basing On Acoustic Signatures
No ratings yet
Ships Classification Basing On Acoustic Signatures
14 pages
Tan 2021 J. Phys. Conf. Ser. 1994 012016
No ratings yet
Tan 2021 J. Phys. Conf. Ser. 1994 012016
6 pages
Review of Data Mining Classification Techniques
No ratings yet
Review of Data Mining Classification Techniques
4 pages
Survey On Multiclass Classification Methods
No ratings yet
Survey On Multiclass Classification Methods
9 pages
Data Mining UNIT-III R20 Syllabus
No ratings yet
Data Mining UNIT-III R20 Syllabus
50 pages
PR Assignment 01 - Seemal Ajaz (206979)
No ratings yet
PR Assignment 01 - Seemal Ajaz (206979)
7 pages
Classification[1]
No ratings yet
Classification[1]
45 pages
Analysis of Classification Algorithm in Data Mining
No ratings yet
Analysis of Classification Algorithm in Data Mining
3 pages
An Adaptive Ensemble Classifier For Mining Concept Drifting Data Streams
No ratings yet
An Adaptive Ensemble Classifier For Mining Concept Drifting Data Streams
20 pages
6 الى13 داتا ماينق
No ratings yet
6 الى13 داتا ماينق
19 pages
The Decision Tree Classifier: Design and Potential: Abstmct-Tiús Paper Presents The Basic Concepts of A Multistage
No ratings yet
The Decision Tree Classifier: Design and Potential: Abstmct-Tiús Paper Presents The Basic Concepts of A Multistage
6 pages
Classifier Systems & Genetic Algorithms: 3.1 Introducing The Classifier System
No ratings yet
Classifier Systems & Genetic Algorithms: 3.1 Introducing The Classifier System
24 pages
Classifier Systems & Genetic Algorithms: 3.1 Introducing The Classifier System
No ratings yet
Classifier Systems & Genetic Algorithms: 3.1 Introducing The Classifier System
24 pages
10 1016@j Inffus 2013 04 006
No ratings yet
10 1016@j Inffus 2013 04 006
31 pages
41 j48 Naive Bayes Weka
No ratings yet
41 j48 Naive Bayes Weka
5 pages
Classification
No ratings yet
Classification
45 pages
(Amores) Notes On Multiple Instance Learning - Review, Taxonomy and Comparative Study
No ratings yet
(Amores) Notes On Multiple Instance Learning - Review, Taxonomy and Comparative Study
15 pages
Spatial and Temporal Data Mining
No ratings yet
Spatial and Temporal Data Mining
95 pages
A Hybrid Approach From Ant Colony Optimization and K-Neare
No ratings yet
A Hybrid Approach From Ant Colony Optimization and K-Neare
13 pages
Artigo Smallex
No ratings yet
Artigo Smallex
17 pages
Welcome To International Journal of Engineering Research and Development (IJERD)
No ratings yet
Welcome To International Journal of Engineering Research and Development (IJERD)
8 pages
Classification and Prediction
No ratings yet
Classification and Prediction
69 pages
Module 3 Notes (1)
No ratings yet
Module 3 Notes (1)
31 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
V3I4201499b50 PDF
No ratings yet
V3I4201499b50 PDF
8 pages
A Survey On Evolutionary Multiobjective Feature Selection in Classification Approaches Applications and Challenges
No ratings yet
A Survey On Evolutionary Multiobjective Feature Selection in Classification Approaches Applications and Challenges
21 pages
2173572
No ratings yet
2173572
55 pages
Data Mining & Knowledge Discovery
No ratings yet
Data Mining & Knowledge Discovery
34 pages
Desale Classification IDS
No ratings yet
Desale Classification IDS
5 pages
Classification lecture 1
No ratings yet
Classification lecture 1
51 pages
17219wPg#s
No ratings yet
17219wPg#s
26 pages
2015 Elsevier Pattern Matching Based Classification Using Ant Colony Optimization Based Feature Selection
No ratings yet
2015 Elsevier Pattern Matching Based Classification Using Ant Colony Optimization Based Feature Selection
12 pages
Long-Term Applying: ON Computers, 1971
No ratings yet
Long-Term Applying: ON Computers, 1971
4 pages
Multiple Classifier System For Remote Sensing Image Classification A Review
No ratings yet
Multiple Classifier System For Remote Sensing Image Classification A Review
29 pages
Module 3
No ratings yet
Module 3
64 pages
2007 Zhao Lu DESIGN OF DECISION TREE VIA KERNELIZED HIERARCHICAL CLUSTERING FOR MULTICLASS SUPPORT VECTOR MACHINES
No ratings yet
2007 Zhao Lu DESIGN OF DECISION TREE VIA KERNELIZED HIERARCHICAL CLUSTERING FOR MULTICLASS SUPPORT VECTOR MACHINES
17 pages
Class Basic
No ratings yet
Class Basic
75 pages
Discovering New Knowledge - Data Mining
No ratings yet
Discovering New Knowledge - Data Mining
55 pages
Unit-4 DM
No ratings yet
Unit-4 DM
15 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Artificial Intelligence: Introduction To
No ratings yet
Artificial Intelligence: Introduction To
43 pages
A Step by Step Backpropagation Example - Matt Mazur
No ratings yet
A Step by Step Backpropagation Example - Matt Mazur
10 pages
ML NOTES MOD 4
No ratings yet
ML NOTES MOD 4
27 pages
Kecerdasan Buatan: Artificial Intelligence
No ratings yet
Kecerdasan Buatan: Artificial Intelligence
30 pages
Unit I Notes Machine Learning Techniques
No ratings yet
Unit I Notes Machine Learning Techniques
21 pages
Mini Project: Helmet Detection and License Plate Number Recognition
100% (1)
Mini Project: Helmet Detection and License Plate Number Recognition
14 pages
Assignment B 3 Customer Churn Modeling
No ratings yet
Assignment B 3 Customer Churn Modeling
7 pages
Provisional Merit List of Minor Course 2025 YYOR9z
No ratings yet
Provisional Merit List of Minor Course 2025 YYOR9z
18 pages
Support Vector Machine (SVM Classifier) Implemenation in Python With Scikit-Learn
No ratings yet
Support Vector Machine (SVM Classifier) Implemenation in Python With Scikit-Learn
21 pages
Ifet College of Engineering (An Autonomous Institution) Department of Electronics and Communication Engineering Question Bank
No ratings yet
Ifet College of Engineering (An Autonomous Institution) Department of Electronics and Communication Engineering Question Bank
3 pages
One-Shot Learning in FaceNet
No ratings yet
One-Shot Learning in FaceNet
11 pages
Suguna 1 IEEE Xplore Conference
No ratings yet
Suguna 1 IEEE Xplore Conference
7 pages
SLA - Class Test - 4 - AnswerKey
No ratings yet
SLA - Class Test - 4 - AnswerKey
2 pages
Loan Approval Predictor: Presented By: Garvit Chaudhary Anuprakash
No ratings yet
Loan Approval Predictor: Presented By: Garvit Chaudhary Anuprakash
11 pages
Prudhvi Entity Aug
No ratings yet
Prudhvi Entity Aug
14 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
16 pages
18. Discovering_the_Depths_of_Cotton_Leaf_Disease_Detection_Integrating_Hypertuned_Residual_Networks_with_GradCAM_XAI_for_Enhanced_Understanding_and_Diagnosis
No ratings yet
18. Discovering_the_Depths_of_Cotton_Leaf_Disease_Detection_Integrating_Hypertuned_Residual_Networks_with_GradCAM_XAI_for_Enhanced_Understanding_and_Diagnosis
6 pages
基于Inception与Residual组合网络的农作物病虫害识别
No ratings yet
基于Inception与Residual组合网络的农作物病虫害识别
6 pages
AlphaGo Mastering The Game of Go With Deep Neural Networks and Tree Search
100% (2)
AlphaGo Mastering The Game of Go With Deep Neural Networks and Tree Search
273 pages
Module - 5 - ANN
No ratings yet
Module - 5 - ANN
50 pages
Introduction: Introduction To Soft Computing Introduction To Fuzzy Sets and Fuzzy Logic Systems Introduction
No ratings yet
Introduction: Introduction To Soft Computing Introduction To Fuzzy Sets and Fuzzy Logic Systems Introduction
1 page
Machine Learning SVM - Supervised
No ratings yet
Machine Learning SVM - Supervised
32 pages
Download (Ebook) Python Deep Learning: Understand how deep neural networks work and apply them to real-world tasks by Vasilev, Ivan ISBN 9781837638505, 1837638500 ebook All Chapters PDF
100% (12)
Download (Ebook) Python Deep Learning: Understand how deep neural networks work and apply them to real-world tasks by Vasilev, Ivan ISBN 9781837638505, 1837638500 ebook All Chapters PDF
65 pages
Deep+Learning+Mind+Map+PDF+Download
No ratings yet
Deep+Learning+Mind+Map+PDF+Download
1 page
Aspect Based Sentiment Analysis Using Fine-Tuned BERT Model With Deep Context Features
No ratings yet
Aspect Based Sentiment Analysis Using Fine-Tuned BERT Model With Deep Context Features
12 pages
CIT Hare Rama Chakraborthy AIDS
No ratings yet
CIT Hare Rama Chakraborthy AIDS
1 page
Chapter 1 (Rich & Knight)
88% (32)
Chapter 1 (Rich & Knight)
38 pages
Unit-Wise 30 Questions
No ratings yet
Unit-Wise 30 Questions
2 pages
Machine Learning: April 2022
No ratings yet
Machine Learning: April 2022
32 pages
A Text Classification Model Based On GCN and BiGRU Fusion
No ratings yet
A Text Classification Model Based On GCN and BiGRU Fusion
5 pages

Dynamic Classifier Selection

Uploaded by

Dynamic Classifier Selection

Uploaded by

Information Fusion 41 (2018) 195–216

Contents lists available at ScienceDirect

Dynamic classiﬁer selection: Recent advances and perspectives

1. Introduction classiﬁed. DS has become an active research topic in the multiple

In this paper, we present an updated taxonomy of dynamic 2. Basic concepts

1. Different initializations: If the training process is initialization

2.3.2. Trainable 2.3.3. Dynamic weighting

4. Dynamic selection techniques 4.3. Local classiﬁer accuracy (LCA)

• f1: The similarity in terms of complexity, which is measured by

Application Pool generation DS methods Ref.

Database No. of instances Dimensionality No. of classes Source

Adult 48,842 14 2 UCI

In this section, we present an empirical comparison between 18 6.1. Datasets

Dataset DES-RRC META-DES META-DES.O DES-KL DES-P KNORA-U

Pima 77.64(2.73) 79.03(2.24) 77.53(2.24) 77.97(2.64) 76.87(1.87) 78.84(2.18)

Hypothesis Berg p Hypothesis Wilcoxon p

META-DES.O vs DES-P 2.88E−5 ••• META-DES vs DES-KL 5.21E−6 •••

namic selection techniques and categorize them based on the pro-

You might also like