Deep Transfer Learning For IoT Attack Detection
Deep Transfer Learning For IoT Attack Detection
18, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.3000476
ABSTRACT The digital revolution has substantially changed our lives in which Internet-of-Things (IoT)
plays a prominent role. The rapid development of IoT to most corners of life, however, leads to various
emerging cybersecurity threats. Therefore, detecting and preventing potential attacks in IoT networks have
recently attracted paramount interest from both academia and industry. Among various attack detection
approaches, machine learning-based methods, especially deep learning, have demonstrated great potential
thanks to their early detecting capability. However, these machine learning techniques only work well when
a huge volume of data from IoT devices with label information can be collected. Nevertheless, the labeling
process is usually time consuming and expensive, thus, it may not be able to adapt with quick evolving IoT
attacks in reality. In this paper, we propose a novel deep transfer learning (DTL) method that allows to learn
from data collected from multiple IoT devices in which not all of them are labeled. Specifically, we develop a
DTL model based on two AutoEncoders (AEs). The first AE (AE1 ) is trained on the source datasets (source
domains) in the supervised mode using the label information and the second AE (AE2 ) is trained on the target
datasets (target domains) in an unsupervised manner without label information. The transfer learning process
attempts to force the latent representation (the bottleneck layer) of AE2 similarly to the latent representation
of AE1 . After that, the latent representation of AE2 is used to detect attacks in the incoming samples in the
target domain. We carry out intensive experiments on nine recent IoT datasets to evaluate the performance
of the proposed model. The experimental results demonstrate that the proposed DTL model significantly
improves the accuracy in detecting IoT attacks compared to the baseline deep learning technique and two
recent DTL approaches.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020 107335
L. Vu et al.: DTL for IoT Attack Detection
However, the machine learning-based methods only per- provides detailed analysis and discussion related to exper-
form well under an important assumption, i.e., the distri- imental results. Finally, Section VII concludes with future
butions of the training data and the predicting data are work.
similar [18]. Nevertheless, in many practical applications,
this assumption may not be always the case [19], [20]. II. RELATED WORK
Especially, in network security, new types of attacks (e.g., There are two main directions for cyberattack detection,
zero-day attacks) can be found on a daily basis [16]. As such, i.e., signature-based and machine learning-based approaches,
the practical IoT data for machine learning models (in the e.g., [8]–[10], [21]. The signature-based methods maintain a
predicting/online phase) is usually very much different from database of predefined signatures (i.e., patterns) that corre-
the data used during the training/offline phase. To alleviate spond to IoT known attacks and perform the detection task
the above problem, a large volume of training data with by comparing these to the incoming data stream [11]–[13],
label from multiple IoT devices is often required. However, [24]. Zhang and Green II [11] proposed a lightweight and
manually labeling a huge volume of data is very time con- low-complexity algorithm to prevent Distributed Denial of
suming and expensive [21], [22]. It, thus, limits the practical Service (DDoS) attacks in which each IoT working node has
deployment of machine learning-based methods in detecting a deep packet inspection to find attack signatures. If a sender
IoT attacks for various scenarios. repeatedly sends requests with the same content, it will be
Given the above, this work proposes a novel deep trans- flagged as malicious requests. Dietz et al. [12] proposed a
fer learning (DTL) approach based on AutoEncoder (AE) solution to proactively block the spreading of IoT attacks
to enable further applications of machine learning in IoT and isolate vulnerable IoT devices. Each IoT device is ver-
attack detection. The proposed model is referred to as Multi- ified in two steps, i.e., scanning to open ports and services
Maximum Mean Discrepancy AE (MMD-AE). MMD-AE and using predefined list of commonly known credentials to
can be trained on a dataset including both labeled samples check authentication. After that, a list of predefined rules is
(in the source domain) and unlabeled samples (in the target used to isolate the vulnerable IoT devices. Nobakht et al. [13]
domain). After training, MMD-AE is used to predict IoT proposed a solution for IoT attack detection using Software
attacks in the incoming traffic in the target domain. Specif- Defined Network with the OpenFlow protocol to address
ically, MMD-AE consists of two AEs: AE1 and AE2 . AE1 malicious behaviours and block intruders from accessing
in trained with labeled data while AE2 is trained on the the IoT devices. This method incorporates a database of
unlabeled data. The whole model, i.e., MMD-AE, is trained all known in-home IoT devices along with the correspond-
to drive the latent representation of AE2 closely to the latent ing patterns of potential security risks. Then, the detection
representation of AE1 . As a result, the latent representation method simply maps the IoT traffic with the signatures of
of AE2 can be used to classify the unlabeled IoT data in the security risks stored in the database. The advantage of the
target domain. The major contributions of this paper are as signature-based methods is providing a low false positive
follows: rate attack detection system [24]. However, they require
• We propose a novel DTL model based on AEs, a prior human knowledge about the behaviours of known
i.e., MMD-AE, that allows to transfer knowledge, IoT attacks to design the database of attack signatures. Thus,
i.e., labeled information, from the source domain to the the accuracy of these methods depends on the quality of
target domain. This model helps to lessen the problem the signature databases. Moreover, if the size of databases
of ‘‘lack label information’’ in collected traffic datasets is increased, the processing time (i.e., search time) can be
from IoT devices. excessive [24].
• We introduce the Maximum Mean Discrepancy (MMD) The machine learning-based methods first train the detec-
metric to minimize the distance between multiple hidden tion models from collected data samples in IoT networks.
layers of AE1 and multiple hidden layers of AE2 . This Then, the trained models are used to classify the new incom-
metric helps to improve the effectiveness of knowledge ing IoT data samples into normal or attack data. The pop-
transferred from the source to the target domain in IoT ular traditional machine learning algorithms for IoT attack
attack detection systems. detection are Decision tree (C4.5), Support Vector Machine
• We experiment our proposed method using nine IoT (SVM), K-Nearest Neighbour, Bayes Classifier, Neural Net-
attack datasets and compare its performance with the works [8], [24]. Recently, the deep learning approach is
canonical deep learning model and the state-of-the-art widely used and achieved high performance in detect-
TL models [18], [31]. The experimental results demon- ing cyberattacks [3], [9], [15]–[17]. Among, deep learning
strate the advantage of our proposed model against the approaches, AE-based models project the original data to a
other tested methods. new latent representation space to improve the accuracy in
The rest of paper is organized as follows. Section II high- detection tasks [3], [15], [16]. Nevertheless, to train a good
lights recent works on IoT attack detection. In Section III, machine learning model for detecting IoT attacks, it is usually
we define a DTL model and briefly describe the AE archi- required to label a huge volume of training data as normal or
tecture. The proposed model is then presented in Section IV. attack [24]. Moreover, general machine learning models often
Section V discusses the experiment settings and Section VI need to assume that the data distribution of training datasets
is similar to the data distribution of predicting datasets. This and training processes are separated for different learning
assumption, however, is usually not practical [19], [20], [25]. tasks. Thus, no knowledge is retained/accumulated nor trans-
Recently, DTL techniques have been used to handle the ferred from one model to another. In TL, the knowledge
above issues of machine learning methods where training data (i.e., features, weights, etc.) from previously trained models
from a source domain and test data from a target domain are in a source domain is used for training newer models in a
drawn from different distributions. A DTL model attempts to target domain. Moreover, TL can even handle the problems of
reduce the distribution divergence between the source domain having less data or no label information in the target domain.
and the target domain [25]. As a result, the trained knowl- TL is often used to transfer knowledge learnt from a source
edge of a learning task (e.g., classification) on the source domain to a target domain where the target domain is different
domain can be used to support the learning task on the similar from the source domain but they are related data distributions.
target domain [19], [25]–[27]. Gou et al. [28] applied an We consider a TL method with an input space X and its label
instance-based DTL approach in network intrusion detec- space Y , two domain distributions are the source domain DS
tion that requires label information from the target domain. and the target domain DT . Two corresponding samples are
nS
Zhao et al. [29] proposed the feature-based DTL technique given, i.e., the source sample DS = (XS , YS ) = (xSi , yiS )i=1
to project the source and the target domain into the latent and the target sample DT = (XT ) = (xTi )ni=1T
. nS and nT are
subspace via linear transformations, i.e., Principal Compo- the number of samples in the source domain and the target
nent Analysis (PCA) for network attack detection. However, domain, respectively. In this paper, the TL model based on
PCA is a linear mapping technique that only works well with a deep neural network, i.e., deep transfer learning (DTL),
a simple data feature set [30]. is trained on the labeled data in the source domain and the
Our proposed DTL model in this paper, i.e., MMD-AE, unlabeled data in the target domain. After that, the trained
leverages a non-linear mapping, i.e., AE, to improve the model is used for IoT attack detection in the target domain.
performance of IoT attack detection on the target domain.
The key idea of our proposed DTL (compared with previous B. AUTOENCODERS
AE-based DTL methods [18], [31]) is that the knowledge This subsection describes the structure and the training pro-
of features in every encoding layers (instead of the only cess of an AutoEncoder (AE) that is fundamental for our DTL
bottleneck layer in previous works) is transferred to the target model. The reason we develop the TL models based on AE is
domain. This helps to force the latent representation of the tar- that these models are proved as the most effective deep neural
get domain similarly to the latent representation of the source network for IoT attack detection [2], [3], [15], [16]. Addi-
domain. The experimental results illustrate the effectiveness tionally, to prove the effectiveness of the proposed model,
of our proposed DTL model on the IoT attack detection task we will compare our proposed model with the previous DTL
in the target domain. techniques that are also based on AE.
An AE is a neural network trained to reconstruct the
III. FUNDAMENTAL BACKGROUND network’s input at its output [34]. This network has two
This section presents the fundamental background of our parts, i.e., encoder and decoder as shown in Fig. 2. Let
proposed model. W , W 0 , b, and b0 denote the weight matrices and the bias
vectors of the encoder and the decoder, respectively, and
A. TRANSFER LEARNING X = x 1 , x 2 , . . . , x n is a training dataset. φ = (W , b) and
Transfer learning (TL) refers to the situation where what has θ = (W 0 , b0 ) are parameter sets for training the encoder and
been learned in one learning task is exploited to improve the decoder, respectively. Let qφ denote the encoder and zi
generalization in another learning task [33]. Fig. 1 compares denote the representation of the input data x i . The encoder
traditional machine learning methods including deep learning maps the input x i to the latent representation zi (as in (1)). The
and TL models. In traditional machine learning, the datasets decoder pθ attempts to map the latent representation zi back
FIGURE 1. Traditional machine learning vs. transfer learning. FIGURE 2. Architecture of an AutoEncoder(AE).
TABLE 1. Description of IoT datasets. Curve (AUC) score. The advantage of AUC includes two
aspects. First, it is scale-invariant. In other words, the AUC
score measures how well predictions are ranked, rather
than their absolute values. Second, AUC is classification-
threshold-invariant. It measures the quality of the model’s
predictions irrespective of what classification threshold is
chosen [40].
The AUC score is created by plotting the True Positive
Rate (TPR) or Sensitivity1 against the False Positive Rate
(FPR)2 at various threshold settings. The space under the
ROC curve is represented as the AUC score [40]. This mea-
sures the average quality of the classification model at differ-
ent thresholds.
C. HYPER-PARAMETERS SETTING
The same configuration is used for all AE-based models in
our experiments. This configuration is based on the AE-based
models for detecting network attacks in the literature [2], [3],
[15], [16]. As we integrate the `SE loss term to MMD-AE,
the number of neurons in the bottleneck layer is equal to the
number of classes in the IoT dataset, i.e., 2 neurons in this
paper. The number of layers including both the encoding lay-
ers and the decoding layers is 5. The ADAM algorithm [41]
is used for optimizing the models in the training process. The
ReLu function is used as an activation function of AE layers
since the distance between multiple layers of the encoders in
except for the last layers of the encoder and decoder where
AE1 and AE2 is evaluated. However, in the predicting phase,
the Sigmoid function is used. For all datasets, we select 10%
only AE2 is used to classify incoming samples in the target
of training data as the validation sets for early stopping. This
domain. Therefore, this model does not lead to increasing the
technique helps to stop training process automatically. The
predicting time compared to other AE-based models.
performance of each model is evaluated on the validation set
at the end of each 10 epochs. If the the AUC score is reduced,
V. EXPERIMENTAL SETTING
the training procedure will be stopped.
This section presents the datasets, the performance metrics,
the hyper-parameter settings and the sets of the experiments
D. EXPERIMENTAL SETS
in our paper.
We carried out three sets of experiments in this paper. The
first set is to investigate how effective our proposed model
A. DATASETS
is at transferring knowledge from the source domain to the
To evaluate the performance of MMD-AE we used nine target domain. We compare the MMD distances between the
IoT attack detection datasets from Meidan et al. [3]. These bottleneck layer of the source domain and the target domain
datasets were collected from nine commercial IoT devices after training when the transferring process is executed in one,
in their lab. Each IoT dataset includes five or ten DDoS two, and three encoding layers. The smaller MMD distance,
attacks based on types of IoT devices, such as Scanning the the more effective transferring process from the source to the
network for vulnerable devices (scan), Sending spam data target domain [42].
(Junk), UDP flooding (udp), TCP flooding (tcp), and Sending The second set is the main result of the paper in which
spam data and opening a connection to a specified IP address we compare the AUC scores of MMD-AE with AE and two
and port (combo). Each dataset is divided into a training set recent DTL models [18], [31]. All methods are trained using
(70% benign data samples and two random types of attacks) the training set including the source dataset with label infor-
and the testing set (30% benign data samples and the rest mation and the target dataset without label information. After
of attacks). Thus, many attack types are not included in the training, the trained models are evaluated using the target
training data. Each data sample has 115 attributes extracted dataset. The methods compared in this experiment include
from the packet stream. The number of training and testing the original AE (i.e., AE), and the DTL model using the
datasets is presented in Table 1.
1 TPR measures the proportion of actual positive samples that are correctly
VI. RESULTS
This section presents the result of three sets of the experi-
ments in our paper.
A. EFFECTIVENESS OF TRANSFERRING
INFORMATION IN MMD-AE
FIGURE 6. Training and testing of AE, SKL-AE, SMD-AE, and MMD-AE
MMD-AE implements multiple transfer between encoding when the source domain is IoT-2 the target domain is IoT-1.
layers of AE1 and AE2 to force the latent representation of
AE2 closer to the latent representation of AE1 . In order to label information in the columns and the dataset without
evaluate if MMD-AE achieve its objective we conducted an information in the rows and tested on the dataset in the rows.
experiment in which, IoT-1 is selected as the source domain In this table, the result of MMD-AE is printed in bold face.
and IoT-2 is the target domain. We measured the MMD We can observe that AE is the worst method among the tested
distance between the latent representation, i.e., the bottleneck methods. Apparently, when an AE is trained on an IoT dataset
layer, of AE1 and AE2 when the transfer information is (the source) and evaluating on other IoT datasets (the target),
implemented in one, two and three layers of the encoders. its performance is not effective. The reason for this ineffective
The smaller distance is, the more information is transferred result is that the predicting data in the target domain is far
from the source domain (AE1 ) to the target domain (AE2 ). different from the training data in the source domain.
The result is presented in Fig. 5. Conversely, the results of three DTL models are much
better than that of AE. For example, if the source dataset
is IoT-1 and the target dataset is IoT-3, the AUC score is
improved from 0.600 to 0.745 and 0.764 with SKL-AE and
SMD-AE, respectively. These results prove that using DTL
helps to improve the accuracy of AEs on detecting IoT attacks
on the target domain.
More importantly, our proposed method, i.e., MMD-AE,
usually achieves the highest AUC score in almost all IoT
datasets.3 For example, the AUC score is 0.937 compared to
0.600, 0.745, 0.764 of AE, SKL-AE and SMD-AE, respec-
tively, when the source dataset is IoT-1 and the target dataset
is IoT-3. The results on the other datasets are also similar to
the results on IoT-3. These results demonstrate that imple-
menting the transferring task in multiple layers of MMD-AE
helps the model to transfer the label information from the
FIGURE 5. MMD of latent representations of the source (IoT-1) and the
target (IoT-2) when transferring task on one, two, and three encoding source to the target domain more effectively. Subsequently,
layers. MMD-AE often achieves better results compared to AE,
SKL-AE and SMD-AE in detecting IoT attacks in the target
The figure shows that transferring task implemented on
domain.
more layers results in the smaller MMD distance value.
In other words, more information can be transferred from
C. PROCESSING TIME ANALYSIS
the source to the target domain when the transferring task is
implemented on more encoding layers. This result evidences Fig. 6 shows the training and the predicting time of the
that our proposed solution, MMD-AE, is more effective than tested model when the source domain is IoT-2 and the target
the previous DTL models performing the transferring task domain is IoT-1.4 In this figure, the training time is measured
only at the bottleneck layer of AE. in hours and the predicting time is measured in seconds.
It can be seen that, the training process of the DTL methods
B. PERFORMANCE COMPARISON 3 The AUC scores of the proposed model in each scenario is presented by
Table 2 represents the AUC scores of AE, SKL-AE, SMD-AE the bold text style.
and MMD-AE when they are trained on the dataset with 4 The results on the other datasets are similar to this result.
TABLE 2. AUC scores of AE, SKL-AE, SMD-AE, and MMD-AE on nine IoT datasets.
(i.e., SKL-AE, SMD-AE, and MMD-AE) is more time con- model in ubiquitous IoT devices. Specifically, the labeled
suming than that of AE. One of the reason is that DTL models data and unlabeled data are fitted into two AE models with
need to evaluate the MMD distance between the AE1 and the same network structure. Moreover, the MMD metric is
AE2 at every iteration while this calculation is not required in used to transfer knowledge from the first AE to the second
AE. Moreover, the training time of MMD-AE is even much AE. Comparing to the previous DTL models, MMD-AE can
higher than those of SKL-AE and SMD-AE since MMD-AE operate at all the encoding layers instead of only the bottle-
needs to calculate the MMD distance between every encoding neck layer.
layers whereas SKL-AE and SMD-AE only calculate the We have carried out the extensive experiments to evaluate
distance metric in the bottleneck layer. the strength of our proposed model in many scenarios. The
However, it is important to note that the predicting time of experimental results demonstrate that DTL approaches can
all DTL methods is mostly equal to that of AE. The reason is enhance the AUC score for IoT attack detection. Further-
that the testing samples are only fitted to one AE in all tested more, our proposed DTL model, i.e., MMD-AE, operating
models. For example, the total of the predicting time of AE, transformation at all the level of encoding layers of the AEs
SKL-AE, SMD-AE, and MMD-AE are 1.001, 1.112, 1.110, helps to improve the effectiveness of the transferring process.
and 1.108 seconds, respectively, on 778, 810 testing samples Thus, the proposed model is meaningful when having label
of the IoT-1 dataset. information in the source domain but no label information in
the target domain.
VII. CONCLUSION One limitation of the proposed model is that it requires
In this paper, we have introduced a novel DTL-based more time to train the model. However, the predicting time
approach for IoT network attack detection, namely MMD- of MMD-AE is mostly similar to that of the other AE-based
AE. This proposed approach aims to address the problem models. In the future, one can extend our current work
of ‘‘lack of labeled information’’ for the training detection in several directions. First, we will distribute the training
process to the multiple IoT nodes by using the federated [20] J. Lu, V. Behbood, P. Hao, H. Zuo, S. Xue, and G. Zhang, ‘‘Transfer
learning technique to speed up this process. Second, the cur- learning using computational intelligence: A survey,’’ Knowl.-Based Syst.,
vol. 80, pp. 14–23, May 2015.
rent DTL model is developed based on AutoEncoder. In the [21] A. L. Buczak and E. Guven, ‘‘A survey of data mining and machine
future, we will attempt to extend this model based on other learning methods for cyber security intrusion detection,’’ IEEE Com-
neural networks such as Deep Adaptation Network (DAN), mun. Surveys Tuts., vol. 18, no. 2, pp. 1153–1176, 2nd Quart.,
2016.
Adversarial Discriminative Domain Adaptation (ADDA), [22] S. García, A. Zunino, and M. Campo, ‘‘Botnet behavior detection using
Maximum Classifier Discrepancy (MCD), and Conditional network synchronism,’’ in Privacy, Intrusion Detection and Response:
Domain Adversarial Network (CDAN) [43]. Technologies for Protecting Networks. Hershey, PA, USA: IGI Global,
2012, pp. 122–144.
[23] W. Zellinger, T. Grubinger, E. Lughofer, T. Natschläger, and S. Saminger-
REFERENCES Platz, ‘‘Central moment discrepancy (CMD) for domain-invariant rep-
[1] N. C. Luong, D. T. Hoang, P. Wang, D. Niyato, D. I. Kim, and Z. Han, resentation learning,’’ 2017, arXiv:1702.08811. [Online]. Available:
‘‘Data collection and wireless communication in Internet of Things (IoT) http://arxiv.org/abs/1702.08811
using economic analysis and pricing models: A survey,’’ IEEE Commun. [24] A. Nisioti, A. Mylonas, P. D. Yoo, and V. Katos, ‘‘From intrusion detec-
Surveys Tuts., vol. 18, no. 4, pp. 2546–2590, Jun. 2016. tion to attacker attribution: A comprehensive survey of unsupervised
methods,’’ IEEE Commun. Surveys Tuts., vol. 20, no. 4, pp. 3369–3388,
[2] Y. Meidan, M. Bohadana, A. Shabtai, M. Ochoa, N. O. Tippenhauer,
Jul. 2018.
J. Davis Guarnizo, and Y. Elovici, ‘‘Detection of unauthorized IoT devices
[25] Y. Xu, S. J. Pan, H. Xiong, Q. Wu, R. Luo, H. Min, and H. Song, ‘‘A unified
using machine learning techniques,’’ 2017, arXiv:1709.04647. [Online].
framework for metric transfer learning,’’ IEEE Trans. Knowl. Data Eng.,
Available: http://arxiv.org/abs/1709.04647
vol. 29, no. 6, pp. 1158–1171, Jun. 2017.
[3] Y. Meidan, M. Bohadana, Y. Mathov, Y. Mirsky, A. Shabtai,
[26] K. Weiss, T. M. Khoshgoftaar, and D. Wang, ‘‘A survey of transfer learn-
D. Breitenbacher, and Y. Elovici, ‘‘N-BaIoT—Network-based detection
ing,’’ J. Big Data, vol. 3, no. 1, May 2016.
of iot botnet attacks using deep autoencoders,’’ IEEE Pervasive Comput.,
[27] C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu, ‘‘A survey on deep
vol. 17, no. 3, pp. 12–22, Jul./Sep. 2018.
transfer learning,’’ in Proc. Int. Conf. Artif. Neural Netw. Rhodes, Greece:
[4] I. Ahmed, A. P. Saleel, B. Beheshti, Z. A. Khan, and I. Ahmad, ‘‘Security in
Springer, Oct. 2018, pp. 270–279.
the Internet of Things (IoT),’’ in Proc. 4th HCT Inf. Technol. Trends (ITT),
[28] S. Gou, Y. Wang, L. Jiao, J. Feng, and Y. Yao, ‘‘Distributed transfer network
Oct. 2017, pp. 84–90.
learning based intrusion detection,’’ in Proc. IEEE Int. Symp. Parallel
[5] N. Vlajic and D. Zhou, ‘‘IoT as a land of opportunity for DDoS hackers,’’
Distrib. Process. Appl., Aug. 2009, pp. 511–515.
Computer, vol. 51, no. 7, pp. 26–34, Jul. 2018.
[29] J. Zhao, S. Shetty, J. W. Pan, C. Kamhoua, and K. Kwiat, ‘‘Transfer
[6] C. Kolias, G. Kambourakis, A. Stavrou, and J. Voas, ‘‘DDoS in the IoT:
learning for detecting unknown network attacks,’’ EURASIP J. Inf. Secur.,
Mirai and other botnets,’’ Computer, vol. 50, no. 7, pp. 80–84, 2017.
vol. 2019, p. 1, Feb. 2019.
[7] R. Gow, F. A. Rabhi, and S. Venugopal, ‘‘Anomaly detection in complex
[30] I. T. Jolliffe, Principal Component Analysis. 2nd ed. New York, NY, USA:
real world application systems,’’ IEEE Trans. Netw. Service Manage.,
Springer-Verlag, 2002.
vol. 15, no. 1, pp. 83–96, Mar. 2018.
[31] F. Zhuang, X. Cheng, P. Luo, S. J. Pan, and Q. He, ‘‘Supervised represen-
[8] S. Khattak, N. R. Ramay, K. R. Khan, A. A. Syed, and S. A. Khayam, tation learning: Transfer learning with deep autoencoders,’’ in Proc. 24th
‘‘A taxonomy of botnet behavior, detection, and defense,’’ IEEE Commun. Int. Joint Conf. Artif. Intell., 2015, pp. 4119–4125.
Surveys Tuts., vol. 16, no. 2, pp. 898–924, 2nd Quart., 2014. [32] C. Kandaswamy, L. M. Silva, L. A. Alexandre, R. Sousa, J. M. Santos,
[9] J. Dromard, G. Roudiere, and P. Owezarski, ‘‘Online and scalable unsu- and J. M. de Sa, ‘‘Improving transfer learning accuracy by reusing stacked
pervised network anomaly detection method,’’ IEEE Trans. Netw. Service denoising autoencoders,’’ in Proc. IEEE Int. Conf. Syst., Man, Cybern.
Manage., vol. 14, no. 1, pp. 34–47, Mar. 2017. (SMC), Oct. 2014, pp. 1380–1387.
[10] H. Bahsi, S. Nomm, and F. B. La Torre, ‘‘Dimensionality reduction for [33] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning.
machine learning based IoT botnet detection,’’ in Proc. 15th Int. Conf. Cambridge, MA, USA: MIT Press, 2016. [Online]. Available:
Control, Autom., Robot. Vis. (ICARCV), Nov. 2018, pp. 1857–1862. http://www.deeplearningbook.org
[11] C. Zhang and R. C. Green II, ‘‘Communication security in Internet of [34] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, ‘‘Greedy layer-wise
Thing: Preventive measure and avoid DDoS attack over IoT network,’’ training of deep networks,’’ in Proc. Adv. Neural Inf. Process. Syst., 2007,
in Proc. 18th Symp. Commun. Netw., Alexandria, VA, USA, Apr. 2015, pp. 153–160.
pp. 8–15. [35] A. Gretton, K. Borgwardt, M. Rasch, B. Schölkopf, and A. J. Smola,
[12] C. Dietz, R. L. Castro, J. Steinberger, C. Wilczak, M. Antzek, A. Sperotto, ‘‘A Kernel method for the two-sample-problem,’’ in Proc. Adv. Neural Inf.
and A. Pras, ‘‘IoT-botnet detection and isolation by access routers,’’ in Process. Syst., 2007, pp. 513–520.
Proc. 9th Int. Conf. Netw. Future (NOF), Nov. 2018, pp. 88–95. [36] P. Yang, F. Luo, S. Wu, J. Xu, and D. Zhang, ‘‘Learning unsupervised word
[13] M. Nobakht, V. Sivaraman, and R. Boreli, ‘‘A host-based intrusion detec- mapping via maximum mean discrepancy,’’ in Proc. CCF Int. Conf. Natu-
tion and mitigation framework for smart home IoT using OpenFlow,’’ in ral Lang. Process. Chin. Comput. Dunhuang, China: Springer, Oct. 2019,
Proc. 11th Int. Conf. Availability, Rel. Secur. (ARES), Salzburg, Austria, pp. 290–302.
Aug. 2016, pp. 147–156. [37] S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, ‘‘Domain adaptation via
[14] J. Ceron, K. Steding-Jessen, C. Hoepers, L. Granville, and C. Margi, transfer component analysis,’’ IEEE Trans. Neural Netw., vol. 22, no. 2,
‘‘Improving IoT botnet investigation using an adaptive network layer,’’ pp. 199–210, Feb. 2011.
Sensors, vol. 19, no. 3, p. 727, Feb. 2019. [38] (2020). Tcptrace Tool for Analysis of TCP Dump Files. [Online]. Available:
[15] L. Vu, V. L. Cao, Q. U. Nguyen, D. N. Nguyen, D. T. Hoang, and http://www.tcptrace.org/
E. Dutkiewicz, ‘‘Learning latent distribution for distinguishing network [39] (2020). Wireshark Tool, the World’s Foremost and Widely-Used Network
traffic in intrusion detection system,’’ in Proc. IEEE Int. Conf. Commun. Protocol Analyzer. [Online]. Available: https://www.wireshark.org/
(ICC), May 2019, pp. 1–6. [40] D. J. Hand and R. J. Till, ‘‘A simple generalisation of the area under
[16] V. C. Loi, M. Nicolau, and J. McDermott, ‘‘Learning neural representations the ROC curve for multiple class classification problems,’’ Mach. Learn.,
for network anomaly detection,’’ IEEE Trans. Cybern., vol. 49, no. 8, vol. 45, no. 2, pp. 171–186, 2001.
pp. 3074–3087, Aug. 2019. [41] D. P. Kingma and J. Ba, ‘‘Adam: A method for stochastic
[17] O. Ibidunmoye, A.-R. Rezaie, and E. Elmroth, ‘‘Adaptive anomaly detec- optimization,’’ 2014, arXiv:1412.6980. [Online]. Available:
tion in performance metric streams,’’ IEEE Trans. Netw. Service Manage., http://arxiv.org/abs/1412.6980
vol. 15, no. 1, pp. 217–231, Mar. 2018. [42] J. Yang, R. Yan, and A. G. Hauptmann, ‘‘Cross-domain video concept
[18] L. Wen, L. Gao, and X. Li, ‘‘A new deep transfer learning based on sparse detection using adaptive svms,’’ in Proc. 15th Int. Conf. Multimedia (MUL-
auto-encoder for fault diagnosis,’’ IEEE Trans. Syst., Man, Cybern. Syst., TIMEDIA), 2007, pp. 188–197.
vol. 49, no. 1, pp. 136–144, Jan. 2019. [43] Y. Zhang, T. Liu, M. Long, and M. Jordan, ‘‘Bridging theory and algorithm
[19] S. Jialin Pan and Q. Yang, ‘‘A survey on transfer learning,’’ IEEE Trans. for domain adaptation,’’ in Proc. 36th Int. Conf. Mach. Learn., 2019,
Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010. pp. 7404–7413.
LY VU received the M.S. degree from Inha Univer- DINH THAI HOANG (Member, IEEE) received
sity, South Korea, in 2014. She is currently pursu- the Ph.D. degree in computer science and engi-
ing the Ph.D. degree in the major of mathematics neering from Nanyang Technological University,
theory for information technology with Le Quy Singapore, in 2016. He is currently a Faculty
Don Technical University, Vietnam. Her research Member of the School of Electrical and Data
interests include data mining, machine learning, Engineering, University of Technology Sydney,
deep learning, and network security. Australia. His research interests include emerging
topics in wireless communications and network-
ing such as ambient backscatter communica-
tions, vehicular communications, cybersecurity,
the IoT, and 5G networks. He is currently an Editor of the IEEE
QUANG UY NGUYEN received the Ph.D. degree WIRELESS COMMUNICATIONS LETTERS and the IEEE TRANSACTIONS ON COGNITIVE
from University College Dublin, Ireland, in 2011. COMMUNICATIONS AND NETWORKING. He was an Exemplary Reviewer of the
He is currently a Senior Lecturer with Le Quy IEEE TRANSACTIONS ON COMMUNICATIONS, in 2018, and the IEEE TRANSACTIONS
Don Technical University (LQDTU), where he is ON WIRELESS COMMUNICATIONS, in 2017 and 2018.
also the Director of the Machine Learning and
Applications Research Group. His research inter-
ests include machine learning, computer vision,
information security, evolutionary algorithms, and
genetic programming.