Combining_active_learning_with_concept_drift_detection_for_data_stream_mining
Combining_active_learning_with_concept_drift_detection_for_data_stream_mining
Abstract—Most of data stream classifier learning methods here properties of data may change over time. For example let
assume that a true class of an incoming object is available us look at malware detection problem. Such malicious software
right after the instance has been processed and new and labeled is far from being static, as it evolves over time to elude ever-
instance may be used to update a classifier’s model, drift detection improving security systems. Such a phenomenon is known as
or capturing novel concepts. However, assumption that we have concept drift [5]. Efficient data stream mining methods must
an unlimited and infinite access to class labels is very naive and
assume the presence of this problem and be able to tackle it
usually would require a very high labeling cost. Therefore the
applicability of many supervised techniques is limited in real-life efficiently by constantly adapting to non-stationary distribution
stream analytics scenarios. Active learning emerges as a potential of data [6]. The challenge lies in how to properly use the
solution to this problem, concentrating on selecting only the most incoming objects to keep our learning model updated and limit
valuable instances and learning an accurate predictive model with the costs imposed by constantly modifying our recognition
as few labeling queries as possible. However learning from data system [7].
streams differ from online learning as distribution of examples
may change over time. Therefore, an active learning strategy must When mining data streams unlabeled objects are abundant
be able to handle concept drift and quickly adapt to evolving as they arrive over time with given ratio specific to the analyzed
nature of data. In this paper we present novel active learning problem. However labels assigned to these instances may be
strategies that are designed for effective tackling of such changes. costly to obtain due to required human input (labor cost).
We assume that most labeling effort is required when concept In some applications we may obtain true class labels with
drift occurs, as we need a representative sample of new concept very small cost (e.g., weather prediction), but this is not true
to retrain properly the predictive model. Therefore, we propose for most of problems and is connected with the label delay
active learning strategies that are guided by drift detection issue. In most problems obtaining a label would require a
module to save budget for difficult and evolving instances. Three constant access to human expert or a kind of oracle. This is
proposed strategies are based on learner uncertainty, dynamic
allocation of budget over time and search space randomization.
subject to various constraints, e.g., financial (we need to pay
Experimental evaluation of the proposed methods prove their the expert), time (objects may appear faster than the expert
usefulness for reducing labeling effort in learning from drifting is able to handle them), logistical (expert may not be able to
data streams. work 24/7) or resources (some expert-based procedures, like
laboratory tests, cannot be repeated continuously). Sometimes,
Keywords—machine learning; data stream mining; concept access to the true label is delayed, even if we have access to
drift; active learning; drift detection an oracle, e.g., the true label for the problem of the credit
approval is available ca. 2 years after the decision, while
I. I NTRODUCTION some medical diagnosis could be confirmed after laboratory
test and may take a few weeks. However, most of the methods
Contemporary machine learning problems are much more described in the literature for learning from streams naively
complex than the ones we have faced 10 or 20 years ago. With assume that true labels for objects are available all the time
the advance of the big data era we need to address emerging upon request. This assumption is a highly unrealistic one and
problems deeply connected with the nature of analyzed in- limits the usefulness of many supervised techniques in real-
stances. We can identify the 5 V s of big data: volume, velocity, life tasks [8], [9]. Therefore, methods for selecting only the
variety, veracity and value. Let us take a look on the problem most valuable samples for labeling are of crucial importance
of velocity. This paradigm assume that data is in constant mo- to data stream mining community. Here active learning have
tion, arrives constantly and thus must be handled in real-time. been identified as a promising solution to this challenge [10],
This is further connected with the notion of volume as data [11]. This approach concentrates on how to select objects for
will arrive for potentially infinite amount of time, flooding both labeling instead of requesting for all objects to be labeled. This
the processing and storage systems [1], [2]. Such a problem problem is well-known and extensively discussed in static [12]
is known in the literature as data stream [3], [4]. This forces and online scenarios [13]. However there are only few works
us to develop new methods that are able to handle learning discussing the problem of active learning for data streams [14],
from such ever-growing collection of instances under certain [15], [16], [17], [18], especially in the presence of concept drift
constrains as time and memory limitations. However, learning [10]. The difference between active learning in online and data
from data streams differ from traditional online learning as stream scenario is the changes expectation. In online scenario
II. M INING DATA STREAMS WITH CONCEPT DRIFT In this section we will describe the proposed active learning
strategies guided by drift detection for evolving data streams.
Four main categories of approaches for handling concept
drift can be distinguished. Let us present shortly all of them.
Methods with triggers, which base on so-called drift de- A. Preliminaries
tectors methods aim at identifying a moment when change
appears or is likely to appear and alarm the recognition system Let us assume that our stream consist of a potentially infi-
[19]. This is an external module that monitors the properties of nite set of examples DS = {(x1 , j1 ), (x2 , j2 ), ..., (xk , jk ), ...},
data stream in a supervised, semi-supervised or unsupervised where xk stands for feature vector (xk ∈ X ) describing the kth
manner. It is important to point out that using supervised drift object and jk its label jk ∈ M, which should be assigned by
detectors require a full access to true class labels or to the oracle and of course learning algorithm should pay for it. As
performance of using classifier, which in real-life scenarios we mentioned before we want to reduce the label querying cost
is almost impossible as discussed in the previous section. On then we introduce a budget B that shows how many instances
the other hand unsupervised drift detection methods cannot we can afford to label. We assume that 0 < B < 1. In cases
detect a real concept drift in cases where statistical properties of B = 0 and B = 1 we would have a fully unlabeled and
of data did not changed (e.g., classes have swapped places) fully labeled data stream respectively. A labeling strategy is
[20]. Therefore, semi-supervised drift detection seems as the an realization of active learning paradigms that allow us to
best option. evaluate if for a currently analyzed sample we are interested
in obtaining its true label. Output of such an strategy is realized
Online learners are classifiers that constantly update their as a Boolean variable, indicating a decision regarding the label
structure while processing the incoming instances [21]. Such query.
2240
Authorized licensed use limited to: Nitte Meenakshi Institute of Technology. Downloaded on June 10,2025 at 16:35:53 UTC from IEEE Xplore. Restrictions apply.
B. Proposed framework C. Random strategy++
This is a very simple active learning strategy that randomly
Let us now present a general framework for the proposed draws instance labels with probability equal to the assumed
active learning strategies. We propose to construct it on an budget B. We propose to improve label query if the change is
online learning scenario from data streams with concept drift. being detected by increasing the labeling probability according
For detecting changes in data we use a drift detector module. to the output of drift detector (alarm or change detected). The
It is realized as ADWIN2 drift detector [27], due to its low details of this strategy are given in Algorithm 2.
computational complexity and proved efficiency. It uses only
labeled samples coming from an active learning, thus not Algorithm 2: Labeling strategy RAND++(x,r,B)
imposing any additional costs on the proposed system. When input: new object x, labeling rate adjustment
the accuracy of the classifier begins to decrease we start to train R ∈ [0, 1], budget B
a new classifier in the background using arriving objects. In Result: labeling ∈ [TRUE, FALSE]
case of change being detected the new classifier replaces the generate a uniform random variable λ ∈ [0, 1]
old one. For each incoming object we check if the labeling if drift warning then
strategy conditions are being fulfilled (they are triggered ran- λ←λ−R
domly or by a loss of classifiers’ confidence). However, we are else
most interested in obtaining labels for new objects appearing if drift detected then
after concept drift. Quickly gathering a representative sample λ ← λ − 2R
would allow for early preparation of new classifier and efficient
replacement of the outdated learner. Therefore, we should labeling ← I(λ ≤ B)
maintain our budget and dynamically allocate it over time
when needed. We propose to create a feedback between the
labeling strategies and drift detection module. In case of alarm
being raised or change being detected we increase the labeling D. Variable uncertainty strategy++
rate in order to probe the emerging concept. Let’s R stands
for labeling ratio, which should depend on answer of the drift This strategy is based on monitoring the certainty of clas-
detector. The labeling ratios should be ordered in the following sifier Ψ decision expressed as its support functions FΨ (x, j)
way: R(static) < R(alarm) < R(change). This allows us for object x belonging to j-th class. It aims to label the
to control budget and save it for obtaining new knowledge for least certain instances within a time interval. A time-variable
the recognition system. The details of the proposed framework threshold imposed on classifier’s certainty is being used. It
are given in Algorithm 1. adjusts itself depending on the incoming data to balance
the budget use over time. For static parts of the stream the
classifier’s certainty stabilizes and threshold is being increased
to allow for labeling of only the most uncertain objects. When
Algorithm 1: Proposed general framework for active the drift detector returns information about detected alarm or
learning from drifting data streams. change we start to rapidly decrease the threshold in order
to allow for gathering a higher number of labeled objects to
input: budget B, labeling rate R, labeling strategy S(x, quickly adapt a new model to the current state of the stream.
R), classifier Ψ, drift detector D The details of this strategy are given in Algorithm 3.
labeling cost b ← 0
while end of stream = FALSE do
obtain new object x from the stream Algorithm 3: Labeling strategy VAR-UN++(x,s,θ,r,Ψ)
if b < B then input: new object x, threshold θ, threshold adjustment
if S(x, R) = TRUE then s ∈ [0, 1], labeling rate adjustment R ∈ [0, 1],
obtain label y of object x R > s, trained classifier Ψ
b←b+1 Result: labeling ∈ [TRUE, FALSE]
update classifier Ψ with (x, y) initialize θ and store its latest value
update drift detector D with (x, y) if maxm∈M FΨ (x, m) < θ then
if drift warning = TRUE then decrease the uncertainty threshold as follows:
start a new classifier Ψnew if drift warning then
increase labeling rate R θ ←θ−R
else else
if drift detected = TRUE then if drift detected then
replace Ψ with Ψnew θ ← θ − 2R
further increase labeling rate R else
else θ ←θ−s
return to initial labeling rate R
labeling ← TRUE
if Ψnew exists then else
update classifier Ψnew with (x, y) increase the uncertainty threshold θ ← θ + s
labeling ← FALSE
2241
Authorized licensed use limited to: Nitte Meenakshi Institute of Technology. Downloaded on June 10,2025 at 16:35:53 UTC from IEEE Xplore. Restrictions apply.
E. Randomized variable uncertainty strategy++ We compare the proposed strategies (RAND++, VAR-
UN++ and R-VAR-UN++) with their basic versions that do
This is a modification of the previous strategy that modifies not use information from the drift detector [10]. We use the
the threshold by a random factor. This allows for labeling some following parameters for these strategies: threshold adjustment
of the examples to which classifier displays high certainty in s = 0.01, labeling rate adjustment r = 0.03 and thresh-
order not to miss any possible drift that may appear in any part old random variance δ = 1. We analyze the budget size
of the decision space. However, this happens at the expense B ∈ [0.05, 0.10, · · · , 0.60] that is being calculated over a time
of sacrificing some of uncertain instances. Thus this strategy window of 2500 instances. Hoeffding tree is selected as a base
is expected to perform worse than its predecessor for static classifier.
streams, but adapt faster to occurring changes. The details of
this strategy are given in Algorithm 4. For evaluating classifiers we use the prequential accu-
racy metric. Wilcoxon signed-rank test is adopted as a non-
parametric statistical procedure to perform pairwise compar-
Algorithm 4: Labeling strategy R-VAR-
isons between the classifier trained on fully labeled stream
UN++(x,s,θ,δ,R,Ψ)
and using active learning strategies with varying budgets.
input: new object x, threshold θ, threshold adjustment
s ∈ [0, 1], labeling rate adjustment R ∈ [0, 1], r
> s, threshold random variance δ, trained B. Results and discussion
classifier Ψ Figure 1 presents a detailed prequential accuracies for six
Result: labeling ∈ [TRUE, FALSE] examined strategies with varying budget sizes over six stream
initialize θ and store its latest value benchmarks, while Table II depicts a comparison of single
η ← random multiplier ∈ N (1, δ) best accuracies for each of three proposed active learning
θrand ← θ × η strategies and a classifier trained on a fully labeled dataset.
if maxj∈M FΨ (x, j) < θrand then Please note that our aim is to get as close as possible to
decrease the uncertainty threshold as follows: accuracies displayed by a classifier with a full access to class
if drift warning then labels, at the same time using as lowest budget as possible.
θ ←θ−R
else
if drift detected then TABLE II: Comparison of averaged prequential accuracies for
θ ← θ − 2R Hoeffding tree trained on a fully labeled data stream (FULL)
else and the best one obtained from active learning strategies.
θ ←θ−s
labeling ← TRUE Dataset FULL RAND++ VAR-UN++ R-VAR-UN++
else Airlines 69.38 67.25 65.69 66.02
increase the uncertainty threshold θ ← θ + s Electricity 81.17 78.95 79.59 80.20
Forest Cover 80.34 71.67 74.28 74.39
labeling ← FALSE RBF 93.47 92.07 92.26 92.98
Hyperplanes 83.16 81.95 82.03 82.21
Tree 69.98 68.05 69.11 69.71
IV. E XPERIMENTAL STUDY From these results we can observe that only for the
Electricity datasets the proposed active learning strategies were
In this section we present the experimental evaluation of similar to the reference ones. For remaining stream bench-
the proposed active learning methods for drifting data streams. marks we can observe a significant gain in accuracy when
the feedback from drift detector is being utilized by the label
A. Set-up query. Additionally, the proposed improved strategies perform
very well even with limited budgets, offering a balanced
For non-stationary data streams there is still just a few effectiveness regardless of the budget setting. This is especially
publicly available data sets to work with. Most of them are vivid for Airlines, Forest Cover and Hyperplanes datasets. This
artificially generated ones, with only some real-life examples. allows us to conclude that for limited budget the introduced
Following the standard approaches found in literature we labeling queries are concentrated mainly on moments when
decided to use both artificial and real-life data sets, details drift takes place, thus better sampling the changed distribution
of which can be found in Table I. and allowing for rapid construction of a more competent
classifier for the current concept.
TABLE I: Details of data stream benchmarks used in the Results of Wilcoxon test over multiple datasets are pre-
experiments. sented in Table III. We can see that the proposed strategies
obtain very similar results to a classifier trained on a fully
Data set Objects Features Classes Drift type labeled stream. Using as little as 15% of data were are able to
Airlines 539 383 7 2 unknown induce a classifier that is not statistically significantly differ
Electricity 45 312 7 2 unknown
Forest Cover 581 012 53 7 unknown
from one that has access to all of labels. This is a very
RBF 1 000 000 20 4 gradual important observation, which proves that by careful labeling
Hyperplane 1 000 000 10 2 incremental of only the most difficult and evolving instances we are able
Tree 1 000 000 10 6 sudden recurring
to obtain comparable accuracy at greatly decreased cost.
2242
Authorized licensed use limited to: Nitte Meenakshi Institute of Technology. Downloaded on June 10,2025 at 16:35:53 UTC from IEEE Xplore. Restrictions apply.
68
74
80
67
72
78
accuracy[%]
accuracy[%]
66
accuracy[%]
70
65
76
64
RAND RAND
RAND
68
RAND++ RAND++
RAND++
VAR−UN VAR−UN
63
VAR−UN
74
VAR−UN++ VAR−UN++ VAR−UN++
R−VAR−UN R−VAR−UN R−VAR−UN
66
R−VAR−UN++ R−VAR−UN++ R−VAR−UN++
62
0.1 0.2 0.3 0.4 0.5 0.6 0.1 0.2 0.3 0.4 0.5 0.6 0.1 0.2 0.3 0.4 0.5 0.6
budget used budget used budget used
83
70
82
69
92
81
68
accuracy[%]
accuracy[%]
accuracy[%]
91
80
67
90
79
66
RAND RAND RAND
RAND++ RAND++ RAND++
89
78
65
VAR−UN VAR−UN VAR−UN
VAR−UN++ VAR−UN++ VAR−UN++
R−VAR−UN R−VAR−UN R−VAR−UN
R−VAR−UN++ R−VAR−UN++ R−VAR−UN++
88
77
0.1 0.2 0.3 0.4 0.5 0.6 0.1 0.2 0.3 0.4 0.5 0.6 64 0.1 0.2 0.3 0.4 0.5 0.6
budget used budget used budget used
Fig. 1: Accuracies on examined datasets for Hoeffding tree and a given labeling budget using different active learning strategies.
TABLE III: Wilcoxon tests for comparing a Hoeffding tree trained using a fully labeled stream (FULL) and a Hoeffding tree
trained with selected labeling strategy and fixed budget. Symbol ”<” stands for situation when classifier trained on a fully labeled
data stream is statistically significantly better and symbol ”=” for situation when there are no statistically significant differences
between the proposed active learning approach and fully labeled stream.
Budget
Comparison 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60
RAND++ vs. FULL <0.3643 <0.2017 <0.0740 =0.0483 =0.4724 =0.4503 =0.4025 = 0.3977 = 0.3643 =0.3428 = 0.3215 =0.3194
VAR-UN++ vs. FULL <0.2916 <0.1866 =0.0492 =0.0446 =0.0418 =0.0378 =0.0382 =0.0357 =0.0321 =0.0277 =0.0273 =0.0270
R-VAR-UN++ vs. FULL <0.2848 <0.1609 =0.0487 =0.0431 =0.0409 =0.0351 =0.0369 =0.0334 =0.0313 =0.0258 =0.0249 =0.0246
Finally let us analyze how well the proposed and reference V. C ONCLUSIONS AND FUTURE WORKS
active learning strategies managed the concept drift occurrence.
Figure 2 depicts the percentage of drift examples in the labeled In this paper we have proposed three improved active learn-
set. From these figures we can see that the proposed strategies ing strategies for mining drifting data streams. The novelty of
are able to better identify instances that appear during the drift our proposal lied in a direct feedback from the drift detection
and use them to adapt classification system. Our improved mechanism that controlled the labeling ratio. This way we were
strategies label 2-3 times more examples during drift has a able to dynamically allocate our budget and obtain labels for
direct influence on the obtained accuracies. Additionally, we objects coming from evolved distribution. This had a direct
may see that R-VAR-UN++ strategy is able to detect the link to the accuracy of the classification procedure as we were
highest number of drifting instances, thus proving our claim able to quicker capture the changes in streams. We showed
from Section III-E that usage of threshold randomization will that our proposed strategies allowed for a highly accurate
be beneficial to detection of changes occurring in any point of stream classification by increased label querying in the drifting
the decision space. moments, even when the available budget was small. Using
statistical test we have showed that proposed active learning
2243
Authorized licensed use limited to: Nitte Meenakshi Institute of Technology. Downloaded on June 10,2025 at 16:35:53 UTC from IEEE Xplore. Restrictions apply.
40 40 40
[8] B. Cyganek and S. Gruszczynski, “Hybrid computer vision system for
ratio of drift examples [%]
20 20 20 18.73
126, pp. 78–94, 2014.
16.02
13.56 14.31
10
11.23 10.28 11.04
10
6.09 6.49 7.01 10 7.26
12.59
[9] Z. S. Abdallah, M. M. Gaber, B. Srinivasan, and S. Krishnaswamy,
5.31
0 0 0
“Adaptive mobile activity recognition system with evolving data
streams,” Neurocomputing, vol. 150, pp. 304–317, 2015.
RAND
RAND++
RAND
RAND++
RAND
RAND++
VAR−UN
VAR−UN++
R−VAR−UN
R−VAR−UN++
VAR−UN
VAR−UN++
R−VAR−UN
R−VAR−UN++
VAR−UN
VAR−UN++
R−VAR−UN
R−VAR−UN++
[10] I. Zliobaite, A. Bifet, B. Pfahringer, and G. Holmes, “Active learning
with drifting streaming data,” IEEE Trans. Neural Netw. Learning Syst.,
(a) Airlines (b) Electricity (c) Forest Cover vol. 25, no. 1, pp. 27–39, 2014.
[11] S. Mohamad, A. Bouchachia, and M. Sayed Mouchaweh, “A bi-criteria
40
34.09
40
34.74
40
active learning algorithm for dynamic data streams,” IEEE Trans.
ratio of drift examples [%]
RAND++
RAND
RAND++
RAND
RAND++
VAR−UN
VAR−UN++
R−VAR−UN
R−VAR−UN++
VAR−UN
VAR−UN++
R−VAR−UN
R−VAR−UN++
VAR−UN
VAR−UN++
R−VAR−UN
R−VAR−UN++
[13] L. Ma, S. Destercke, and Y. Wang, “Online active learning of decision
trees with evidential data,” Pattern Recognition, vol. 52, pp. 33–45,
2016.
(d) RBF (e) Hyperplane (f) Tree [14] M. Bouguelia, Y. Belaı̈d, and A. Belaı̈d, “An adaptive streaming active
learning strategy based on instance weighting,” Pattern Recognition
Letters, vol. 70, pp. 38–44, 2016.
Fig. 2: Percentage of drift examples in the labeled set by
examined active learning strategies. Results averaged over all [15] B. Kurlej and M. Woźniak, “Active learning approach to concept drift
problem,” Logic Journal of the IGPL, vol. 20, no. 3, pp. 550–559, 2012.
budgets examined.
[16] H. Nguyen, W. K. Ng, and Y. Woon, “Concurrent semi-supervised
learning with active learning of data streams,” Trans. Large-Scale Data-
and Knowledge-Centered Systems, vol. 8, pp. 113–136, 2013.
[17] M. Woźniak, B. Cyganek, A. Kasprzak, P. Ksieniewicz, and
strategies are not significantly worse than using a fully labeled K. Walkowiak, “Active learning classifier for streaming data,” in Hybrid
data stream. This contribution was a step forward to using Artificial Intelligent Systems - 11th International Conference, HAIS
supervised learning methods in realistic stream settings. In our 2016, Seville, Spain, April 18-20, 2016, Proceedings, 2016, pp. 186–
197.
future works we plan to apply active learning strategies with
dynamic budget for multi-class novelty detection in streaming [18] S. Mohamad, M. Sayed Mouchaweh, and A. Bouchachia, “Active
learning for classifying data streams with unknown number of classes,”
data. Neural Networks, vol. 98, pp. 1–15, 2018.
[19] P. M. G. Jr., S. G. T. de Carvalho Santos, R. S. M. de Barros, and D. C.
D. L. Vieira, “A comparative study on concept drift detectors,” Expert
ACKNOWLEDGMENT Syst. Appl., vol. 41, no. 18, pp. 8144–8156, 2014.
This was supported by the Polish National Science Center [20] P. Sobolewski and M. Woniak, “Concept drift detection and model se-
lection with simulated recurrence and ensembles of statistical detectors,”
under the grant no. DEC-2013/09/B/ST6/02264. Journal of Universal Computer Science, vol. 19, no. 4, pp. 462–483,
feb 2013.
All experiments were carried out using computer equip-
[21] G. Melki, V. Kecman, S. Ventura, and A. Cano, “OLLAWV: online
ment sponsored by EC under FP7, Coordination and Support learning algorithm using worst-violators,” Appl. Soft Comput., vol. 66,
Action, Grant Agreement Number 316097, ENGINE - Euro- pp. 384–393, 2018.
pean Research Centre of Network Intelligence for Innovation [22] G. Hulten, L. Spencer, and P. M. Domingos, “Mining time-changing
Enhancement (http://engine.pwr.wroc.pl/). data streams,” in Proceedings of the seventh ACM SIGKDD inter-
national conference on Knowledge discovery and data mining, San
Francisco, CA, USA, August 26-29, 2001, 2001, pp. 97–106.
R EFERENCES [23] P. Domingos and G. Hulten, “A general framework for mining massive
data streams.” Journal of Computational and Graphical Statistics,
[1] A. Cano, “A survey on graphic processing unit computing for large-
vol. 12, pp. 945–949, 2003.
scale data mining,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov.,
vol. 8, no. 1, 2018. [24] U. Yun and G. Lee, “Sliding window based weighted erasable stream
pattern mining for stream data applications,” Future Generation Comp.
[2] H. T. Nguyen, M. T. Thai, and T. N. Dinh, “A billion-scale approxima- Syst., vol. 59, pp. 1–20, 2016.
tion algorithm for maximizing benefit in viral marketing,” IEEE/ACM
Trans. Netw., vol. 25, no. 4, pp. 2419–2429, 2017. [25] B. Krawczyk, L. L. Minku, J. Gama, J. Stefanowski, and M. Woźniak,
“Ensemble learning for data stream analysis: A survey,” Information
[3] M. M. Gaber, “Advances in data stream mining,” Wiley Interdisc. Rew.: Fusion, vol. 37, pp. 132–156, 2017.
Data Mining and Knowledge Discovery, vol. 2, no. 1, pp. 79–85, 2012.
[26] P. R. L. Almeida, L. S. Oliveira, A. S. B. Jr., and R. Sabourin, “Adapting
[4] S. Ramı́rez-Gallego, B. Krawczyk, S. Garcı́a, M. Wozniak, and F. Her- dynamic classifier selection for concept drift,” Expert Syst. Appl., vol.
rera, “A survey on data preprocessing for data stream mining: Current 104, pp. 67–85, 2018.
status and future directions,” Neurocomputing, vol. 239, pp. 39–57,
2017. [27] A. Bifet and R. Gavaldà, “Learning from time-changing data with
adaptive windowing,” in Proceedings of the Seventh SIAM Interna-
[5] J. Gama, I. Zliobaite, A. Bifet, M. Pechenizkiy, and A. Bouchachia, “A tional Conference on Data Mining, April 26-28, 2007, Minneapolis,
survey on concept drift adaptation,” ACM Comput. Surv., vol. 46, no. 4, Minnesota, USA, 2007, pp. 443–448.
pp. 44:1–44:37, 2014.
[6] M. Woźniak, “A hybrid decision tree training method using data
streams,” Knowl. Inf. Syst., vol. 29, no. 2, pp. 335–347, 2011.
[7] I. Zliobaite, M. Budka, and F. T. Stahl, “Towards cost-sensitive adap-
tation: When is it worth updating your predictive model?” Neurocom-
puting, vol. 150, pp. 240–249, 2015.
2244
Authorized licensed use limited to: Nitte Meenakshi Institute of Technology. Downloaded on June 10,2025 at 16:35:53 UTC from IEEE Xplore. Restrictions apply.