0% found this document useful (0 votes)
15 views10 pages

Deep Transfer Learning For IoT Attack Detection

security of smart devices using AI approach

Uploaded by

Amna Safder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views10 pages

Deep Transfer Learning For IoT Attack Detection

security of smart devices using AI approach

Uploaded by

Amna Safder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Received April 8, 2020, accepted May 24, 2020, date of publication June 8, 2020, date of current version June

18, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.3000476

Deep Transfer Learning for IoT Attack Detection


LY VU1 , QUANG UY NGUYEN 1 , DIEP N. NGUYEN 2 , (Senior Member, IEEE),
DINH THAI HOANG 2 , (Member, IEEE), AND ERYK DUTKIEWICZ 2 , (Senior Member, IEEE)
1 Faculty of Information Technology, Le Quy Don Technical University, Hanoi 11917, Vietnam
2 School of Electrical and Data Engineering, University of Technology Sydney, Ultimo, NSW 2007, Australia
Corresponding author: Quang Uy Nguyen ([email protected])
This work was supported by the Vietnam National Foundation for Science and Technology Development (NAFOSTED) under Grant
102.05-2019.05.

ABSTRACT The digital revolution has substantially changed our lives in which Internet-of-Things (IoT)
plays a prominent role. The rapid development of IoT to most corners of life, however, leads to various
emerging cybersecurity threats. Therefore, detecting and preventing potential attacks in IoT networks have
recently attracted paramount interest from both academia and industry. Among various attack detection
approaches, machine learning-based methods, especially deep learning, have demonstrated great potential
thanks to their early detecting capability. However, these machine learning techniques only work well when
a huge volume of data from IoT devices with label information can be collected. Nevertheless, the labeling
process is usually time consuming and expensive, thus, it may not be able to adapt with quick evolving IoT
attacks in reality. In this paper, we propose a novel deep transfer learning (DTL) method that allows to learn
from data collected from multiple IoT devices in which not all of them are labeled. Specifically, we develop a
DTL model based on two AutoEncoders (AEs). The first AE (AE1 ) is trained on the source datasets (source
domains) in the supervised mode using the label information and the second AE (AE2 ) is trained on the target
datasets (target domains) in an unsupervised manner without label information. The transfer learning process
attempts to force the latent representation (the bottleneck layer) of AE2 similarly to the latent representation
of AE1 . After that, the latent representation of AE2 is used to detect attacks in the incoming samples in the
target domain. We carry out intensive experiments on nine recent IoT datasets to evaluate the performance
of the proposed model. The experimental results demonstrate that the proposed DTL model significantly
improves the accuracy in detecting IoT attacks compared to the baseline deep learning technique and two
recent DTL approaches.

INDEX TERMS Deep transfer learning, IoT, cyberattack detection, AutoEncoder.

I. INTRODUCTION cyber attacks than computers [2], [3]. Consequently, detect-


The Internet-of-Things (IoT) refers to connected devices, ing attacks to protect IoT devices from malicious behaviors
sensors, an actuators used in vehicles, electronic appliances, is critical to broadening the applications of IoT [4]–[7].
buildings, and structures. As the sensors, data storage, and the IoT attack detection methods can be categorized into
Internet become cheaper, faster, and more integrated together, signature-based and machine learning-based methods
IoT devices will find more and more applications [1] (e.g., [8]–[10]. The signature-based methods [11]–[14] seek to find
in smart buildings, smart city, intelligent transportation sys- the signatures of IoT attacks in the incoming traffic. These
tems, and healthcare). The rapid development of IoT to most methods require a high prior knowledge of known IoT attacks
corners of life, however, leads to various emerging cyberse- to define the signatures. The machine learning-based meth-
curity threats. This is because IoT devices are often limited ods, on the other hand, attempt to learn the features of normal
in computing capability and energy, making them particu- and malicious data in the training/offline phase. In the pre-
larly vulnerable to adversaries. IoT devices are more exposed dicting/online phase, these models are used to detect attacks
to and unfortunately more difficult to be protected from in the incoming traffic. Thanks to the capability to auto-
matically and progressively learn useful information/features
The associate editor coordinating the review of this manuscript and from collected data, machine-learning based methods can
approving it for publication was Omid Kavehei . early detect various IoT attacks [3], [9], [15]–[17].

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020 107335
L. Vu et al.: DTL for IoT Attack Detection

However, the machine learning-based methods only per- provides detailed analysis and discussion related to exper-
form well under an important assumption, i.e., the distri- imental results. Finally, Section VII concludes with future
butions of the training data and the predicting data are work.
similar [18]. Nevertheless, in many practical applications,
this assumption may not be always the case [19], [20]. II. RELATED WORK
Especially, in network security, new types of attacks (e.g., There are two main directions for cyberattack detection,
zero-day attacks) can be found on a daily basis [16]. As such, i.e., signature-based and machine learning-based approaches,
the practical IoT data for machine learning models (in the e.g., [8]–[10], [21]. The signature-based methods maintain a
predicting/online phase) is usually very much different from database of predefined signatures (i.e., patterns) that corre-
the data used during the training/offline phase. To alleviate spond to IoT known attacks and perform the detection task
the above problem, a large volume of training data with by comparing these to the incoming data stream [11]–[13],
label from multiple IoT devices is often required. However, [24]. Zhang and Green II [11] proposed a lightweight and
manually labeling a huge volume of data is very time con- low-complexity algorithm to prevent Distributed Denial of
suming and expensive [21], [22]. It, thus, limits the practical Service (DDoS) attacks in which each IoT working node has
deployment of machine learning-based methods in detecting a deep packet inspection to find attack signatures. If a sender
IoT attacks for various scenarios. repeatedly sends requests with the same content, it will be
Given the above, this work proposes a novel deep trans- flagged as malicious requests. Dietz et al. [12] proposed a
fer learning (DTL) approach based on AutoEncoder (AE) solution to proactively block the spreading of IoT attacks
to enable further applications of machine learning in IoT and isolate vulnerable IoT devices. Each IoT device is ver-
attack detection. The proposed model is referred to as Multi- ified in two steps, i.e., scanning to open ports and services
Maximum Mean Discrepancy AE (MMD-AE). MMD-AE and using predefined list of commonly known credentials to
can be trained on a dataset including both labeled samples check authentication. After that, a list of predefined rules is
(in the source domain) and unlabeled samples (in the target used to isolate the vulnerable IoT devices. Nobakht et al. [13]
domain). After training, MMD-AE is used to predict IoT proposed a solution for IoT attack detection using Software
attacks in the incoming traffic in the target domain. Specif- Defined Network with the OpenFlow protocol to address
ically, MMD-AE consists of two AEs: AE1 and AE2 . AE1 malicious behaviours and block intruders from accessing
in trained with labeled data while AE2 is trained on the the IoT devices. This method incorporates a database of
unlabeled data. The whole model, i.e., MMD-AE, is trained all known in-home IoT devices along with the correspond-
to drive the latent representation of AE2 closely to the latent ing patterns of potential security risks. Then, the detection
representation of AE1 . As a result, the latent representation method simply maps the IoT traffic with the signatures of
of AE2 can be used to classify the unlabeled IoT data in the security risks stored in the database. The advantage of the
target domain. The major contributions of this paper are as signature-based methods is providing a low false positive
follows: rate attack detection system [24]. However, they require
• We propose a novel DTL model based on AEs, a prior human knowledge about the behaviours of known
i.e., MMD-AE, that allows to transfer knowledge, IoT attacks to design the database of attack signatures. Thus,
i.e., labeled information, from the source domain to the the accuracy of these methods depends on the quality of
target domain. This model helps to lessen the problem the signature databases. Moreover, if the size of databases
of ‘‘lack label information’’ in collected traffic datasets is increased, the processing time (i.e., search time) can be
from IoT devices. excessive [24].
• We introduce the Maximum Mean Discrepancy (MMD) The machine learning-based methods first train the detec-
metric to minimize the distance between multiple hidden tion models from collected data samples in IoT networks.
layers of AE1 and multiple hidden layers of AE2 . This Then, the trained models are used to classify the new incom-
metric helps to improve the effectiveness of knowledge ing IoT data samples into normal or attack data. The pop-
transferred from the source to the target domain in IoT ular traditional machine learning algorithms for IoT attack
attack detection systems. detection are Decision tree (C4.5), Support Vector Machine
• We experiment our proposed method using nine IoT (SVM), K-Nearest Neighbour, Bayes Classifier, Neural Net-
attack datasets and compare its performance with the works [8], [24]. Recently, the deep learning approach is
canonical deep learning model and the state-of-the-art widely used and achieved high performance in detect-
TL models [18], [31]. The experimental results demon- ing cyberattacks [3], [9], [15]–[17]. Among, deep learning
strate the advantage of our proposed model against the approaches, AE-based models project the original data to a
other tested methods. new latent representation space to improve the accuracy in
The rest of paper is organized as follows. Section II high- detection tasks [3], [15], [16]. Nevertheless, to train a good
lights recent works on IoT attack detection. In Section III, machine learning model for detecting IoT attacks, it is usually
we define a DTL model and briefly describe the AE archi- required to label a huge volume of training data as normal or
tecture. The proposed model is then presented in Section IV. attack [24]. Moreover, general machine learning models often
Section V discusses the experiment settings and Section VI need to assume that the data distribution of training datasets

107336 VOLUME 8, 2020


L. Vu et al.: DTL for IoT Attack Detection

is similar to the data distribution of predicting datasets. This and training processes are separated for different learning
assumption, however, is usually not practical [19], [20], [25]. tasks. Thus, no knowledge is retained/accumulated nor trans-
Recently, DTL techniques have been used to handle the ferred from one model to another. In TL, the knowledge
above issues of machine learning methods where training data (i.e., features, weights, etc.) from previously trained models
from a source domain and test data from a target domain are in a source domain is used for training newer models in a
drawn from different distributions. A DTL model attempts to target domain. Moreover, TL can even handle the problems of
reduce the distribution divergence between the source domain having less data or no label information in the target domain.
and the target domain [25]. As a result, the trained knowl- TL is often used to transfer knowledge learnt from a source
edge of a learning task (e.g., classification) on the source domain to a target domain where the target domain is different
domain can be used to support the learning task on the similar from the source domain but they are related data distributions.
target domain [19], [25]–[27]. Gou et al. [28] applied an We consider a TL method with an input space X and its label
instance-based DTL approach in network intrusion detec- space Y , two domain distributions are the source domain DS
tion that requires label information from the target domain. and the target domain DT . Two corresponding samples are
nS
Zhao et al. [29] proposed the feature-based DTL technique given, i.e., the source sample DS = (XS , YS ) = (xSi , yiS )i=1
to project the source and the target domain into the latent and the target sample DT = (XT ) = (xTi )ni=1T
. nS and nT are
subspace via linear transformations, i.e., Principal Compo- the number of samples in the source domain and the target
nent Analysis (PCA) for network attack detection. However, domain, respectively. In this paper, the TL model based on
PCA is a linear mapping technique that only works well with a deep neural network, i.e., deep transfer learning (DTL),
a simple data feature set [30]. is trained on the labeled data in the source domain and the
Our proposed DTL model in this paper, i.e., MMD-AE, unlabeled data in the target domain. After that, the trained
leverages a non-linear mapping, i.e., AE, to improve the model is used for IoT attack detection in the target domain.
performance of IoT attack detection on the target domain.
The key idea of our proposed DTL (compared with previous B. AUTOENCODERS
AE-based DTL methods [18], [31]) is that the knowledge This subsection describes the structure and the training pro-
of features in every encoding layers (instead of the only cess of an AutoEncoder (AE) that is fundamental for our DTL
bottleneck layer in previous works) is transferred to the target model. The reason we develop the TL models based on AE is
domain. This helps to force the latent representation of the tar- that these models are proved as the most effective deep neural
get domain similarly to the latent representation of the source network for IoT attack detection [2], [3], [15], [16]. Addi-
domain. The experimental results illustrate the effectiveness tionally, to prove the effectiveness of the proposed model,
of our proposed DTL model on the IoT attack detection task we will compare our proposed model with the previous DTL
in the target domain. techniques that are also based on AE.
An AE is a neural network trained to reconstruct the
III. FUNDAMENTAL BACKGROUND network’s input at its output [34]. This network has two
This section presents the fundamental background of our parts, i.e., encoder and decoder as shown in Fig. 2. Let
proposed model. W , W 0 , b, and b0 denote the weight matrices and the bias
vectors of the encoder and the decoder, respectively, and
A. TRANSFER LEARNING X = x 1 , x 2 , . . . , x n is a training dataset. φ = (W , b) and
Transfer learning (TL) refers to the situation where what has θ = (W 0 , b0 ) are parameter sets for training the encoder and
been learned in one learning task is exploited to improve the decoder, respectively. Let qφ denote the encoder and zi
generalization in another learning task [33]. Fig. 1 compares denote the representation of the input data x i . The encoder
traditional machine learning methods including deep learning maps the input x i to the latent representation zi (as in (1)). The
and TL models. In traditional machine learning, the datasets decoder pθ attempts to map the latent representation zi back

FIGURE 1. Traditional machine learning vs. transfer learning. FIGURE 2. Architecture of an AutoEncoder(AE).

VOLUME 8, 2020 107337


L. Vu et al.: DTL for IoT Attack Detection

into the input space. Therefore, the output of the decoder is


formed as the input space, i.e., x̂ i (as in (2)).
zi = qφ (x i ) = af (W x i + b), (1)
x̂ i = pθ (zi ) = ag (W 0 zi + b0 ), (2)
where af and ag are the activation functions of the encoder
and the decoder, respectively. Fig. 2 shows an example of AE
with input dimension as n, number of layers as 5, bottleneck
layer size as 2.
The AE model is trained by minimizing a loss function
so called Reconstruction Error (RE). RE is the difference
between the input x i and the output x̂ i as in (3). This term
encourages the decoder to learn to reconstruct the original
data. If the decoder’s output does not reconstruct the data FIGURE 3. Proposed system structure.
well, it will incur a large cost in this loss term.
 1X n
each data sample is extracted from captured packets using
  
`AE x i , φ, θ = l x i , x̂ i , (3)
n Tcptrace tool [38], then the data sample is labeled as a
i=0
normal sample or an attack sample by manually analyzing
where l x i , x̂ i measures the difference between the input

the flow using Wireshark software [39]. Usually, the number
x i and the output x̂ i . In the AE model, the mean squared of labeling IoT devices is much smaller than the number of
error (MSE) is commonly used [16]. unlabeling IoT devices. Second, the collected data is passed
to the DTL model for training. The training process attempts
C. MAXIMUM MEAN DISCREPANCY (MMD) to transfer the knowledge information learnt from the data
Maximum mean discrepancy (MMD) is a metric used to with label information to data without label information.
estimate the discrepancy of two distributions. MMD is more This is achieved by minimizing the difference between latent
flexible than Kullback-Libler divergence (KL) [31] thanks to representations of the source data and the target data. After
its ability to estimate the nonparametric distance [35]. More- training, the trained DTL model is used in the detection
over, MMD does not require to compute the intermediate module that can classify incoming traffic from all IoT devices
density of the distributions, thus avoiding the requirement as normal or attack data. The detailed description of the DTL
of using a sophisticated optimization [36]. The definition of model is presented in the next subsection.
MMD of two datasets can be formulated as (4) [37].
nS nT B. TRANSFER LEARNING MODEL
1 X 1 X
MMD(XS , XT ) =k ξS (xSi ) − ξT (xTi ) kH , (4) The proposed DTL (i.e., MMD-AE) model includes two
nS nT
i=1 i=1 AEs (i.e., AE1 an AE2 ) that have the same architecture as
where nS and nT are the number of samples of the source Fig. 4. The input of AE1 is the data samples from the source
and target domain, respectively. ξS and ξT denote the rep- domain (xSi ) while the input of AE2 is the data samples
resentation of the source data, i.e., xSi , and the target data, from the target domain (xTi ). The training process attempts
i.e., xTi , respectively. k . kH represents the 2-norm operation to minimize the MMD-AE loss function. This loss function
in Reproducing Kernel Hilbert space (RKHS) [37]. includes three terms: the reconstruction error (`RE ) term,
the supervised (`SE ) term and the Multi-Maximum Mean
IV. PROPOSED TRANSFER LEARNING APPROACH Discrepancy (`MMD ) term.
FOR IoT CYBERATTACK DETECTION We assume that φS , θS , φT , θT are the parameter sets of
This section presents our proposed DTL models for IoT encoder and decoder of AE1 and AE2 , respectively. The first
attack detection. We first describe the overview of the sys- term, `RE including RES and RET in Fig. 4, attempts to
tem structure. After that, the DTL model is discussed in reconstruct the input layers at the output layers of both AEs.
details. In other words, the RES and RET try to reconstruct the input
data xS and xT at their output from the latent representations
A. SYSTEM STRUCTURE zS and zT , respectively. Thus, this term encourages two AEs to
Fig. 3 presents the system structure that uses DTL for IoT retain the useful information of the original data at the latent
attack detection. First, the data collection module gathers representation. Consequently, we can use latent representa-
data from all IoT devices. The training data consists of both tions for classification tasks after training. Formally, the `RE
labeled and unlabeled data. The labeled data is collected from term is calculated as follows:
some IoTs devices which are dedicated for labeling data.
The labeling process is usually executed in two steps [22]: `RE (xSi , φS , θS , xTi , φT , θT ) = l(xSi , x̂Si ) + l(xTi , x̂Ti ), (5)

107338 VOLUME 8, 2020


L. Vu et al.: DTL for IoT Attack Detection

where K is the number of encoding layers in the AE-based


model. ξSk (xSi ) and ξTk (xTi ) are the encoding layers k-th of
AE1 and AE2 , respectively, MMD(, ) is the MMD distance
presenting in (4).
The final loss function of MMD-AE combines the loss
terms in (5), (6), and (8) as in (7).
` = `SE + `RE + `MMD . (8)
Algorithm 1 presents the pseudo-code for training our
proposed DTL model. The training samples with labels in the
source domain are input to AE1 while the training samples
without labels in the target domain are input to AE2 . The
training process attempts to minimize the loss function in (8)).
After training, AE2 is used to classify the testing samples in
the target domain as in Algorithm 2.

Algorithm 1 Training the Proposed DTL Model


INPUT:
xS , yS : Training data samples and corresponding labels in
the source domain
xT : Training data samples in the target domain
FIGURE 4. Architecture of MMD-AE.
OUTPUT: Trained models: AE2 .
BEGIN:
where l function is the MSE function [16], xSi , x̂Si , xTi , x̂Ti are
1. Put xS to the input of AE1
the data samples of input layers and the output layers of the
2. Put xT to the input of AE2
source domain and the target domain, respectively.
3. ξk (xS ) is the representation of xS at the layer k of AE1
The second term `SE aims to train a classifier at the latent
4. zS is the representation of xS at the bottleneck layer of
representation of AE1 using labeled information in the source
AE1
domain. In other words, this term attempts to map the value
5. ξk (xT ) is the representation of xT at the layer k of AE2
at two neurons at the bottleneck layer of AE1 , i.e., zS , to their
6. Training the TL model by minimizing the loss function
label information yS . This is achieved by using the softmax
in (8)
function [33] to minimize the difference between zS and
return Trained models: AE1 , AE2 .
yS . It should be noted that, the number of neurons in the
END.
bottleneck layer must be the same as the number of classes
in the source domain. This loss encourages to distinguish
the latent representation space from separated class labels. Algorithm 2 Classifying on the Target Domain
Formally, this loss is defined as follows: INPUT:
C
xT : Testing data samples in the target domain
X i,j i,j Trained AE2 model
`SE (xSi , yiS , φS .θS ) = − yS log(zS ), (6)
OUTPUT: yT : Label of xT
j=1
BEGIN:
where ziS and yiS are the latent representation and labels of the 1. Put xT to the input of AE2
i,j i,j 2. zT is the representation of xT at the bottleneck layer of
source data sample xSi . yS and zS represent the j − th element
of the vector yiS and ziS , respectively. AE2
The third term `MMD is to transfer the knowledge of the 3. yT = softmax (zT )
source domain to the target domain. The transferring process return yT
is executed by minimizing the MMD distances between every END.
encoding layers of AE1 and the corresponding encoding lay-
ers of AE2 . This term aims to make the representations of the Our key idea in the proposed model, i.e., MMD-AE, com-
source data and target data close together. The `MMD loss pared with the previous DTL model [18], [31] is to transfer
term is described as follows: the knowledge not only in the bottleneck layer but also in
every encoding layer from the source domain, i.e., AE1 ,
`MMD (xSi , φS , θS , xTi , φT , θT ) to the target domain, i.e., AE2 . In other words, MMD-AE
K
X allows to transfer more knowledge from the source domain
= MMD(ξSk (xSi ), ξTk (xTi )), (7) to the target domain. One possible limitation of MMD-AE
k=1 is that it may incur the overhead time in the training process

VOLUME 8, 2020 107339


L. Vu et al.: DTL for IoT Attack Detection

TABLE 1. Description of IoT datasets. Curve (AUC) score. The advantage of AUC includes two
aspects. First, it is scale-invariant. In other words, the AUC
score measures how well predictions are ranked, rather
than their absolute values. Second, AUC is classification-
threshold-invariant. It measures the quality of the model’s
predictions irrespective of what classification threshold is
chosen [40].
The AUC score is created by plotting the True Positive
Rate (TPR) or Sensitivity1 against the False Positive Rate
(FPR)2 at various threshold settings. The space under the
ROC curve is represented as the AUC score [40]. This mea-
sures the average quality of the classification model at differ-
ent thresholds.

C. HYPER-PARAMETERS SETTING
The same configuration is used for all AE-based models in
our experiments. This configuration is based on the AE-based
models for detecting network attacks in the literature [2], [3],
[15], [16]. As we integrate the `SE loss term to MMD-AE,
the number of neurons in the bottleneck layer is equal to the
number of classes in the IoT dataset, i.e., 2 neurons in this
paper. The number of layers including both the encoding lay-
ers and the decoding layers is 5. The ADAM algorithm [41]
is used for optimizing the models in the training process. The
ReLu function is used as an activation function of AE layers
since the distance between multiple layers of the encoders in
except for the last layers of the encoder and decoder where
AE1 and AE2 is evaluated. However, in the predicting phase,
the Sigmoid function is used. For all datasets, we select 10%
only AE2 is used to classify incoming samples in the target
of training data as the validation sets for early stopping. This
domain. Therefore, this model does not lead to increasing the
technique helps to stop training process automatically. The
predicting time compared to other AE-based models.
performance of each model is evaluated on the validation set
at the end of each 10 epochs. If the the AUC score is reduced,
V. EXPERIMENTAL SETTING
the training procedure will be stopped.
This section presents the datasets, the performance metrics,
the hyper-parameter settings and the sets of the experiments
D. EXPERIMENTAL SETS
in our paper.
We carried out three sets of experiments in this paper. The
first set is to investigate how effective our proposed model
A. DATASETS
is at transferring knowledge from the source domain to the
To evaluate the performance of MMD-AE we used nine target domain. We compare the MMD distances between the
IoT attack detection datasets from Meidan et al. [3]. These bottleneck layer of the source domain and the target domain
datasets were collected from nine commercial IoT devices after training when the transferring process is executed in one,
in their lab. Each IoT dataset includes five or ten DDoS two, and three encoding layers. The smaller MMD distance,
attacks based on types of IoT devices, such as Scanning the the more effective transferring process from the source to the
network for vulnerable devices (scan), Sending spam data target domain [42].
(Junk), UDP flooding (udp), TCP flooding (tcp), and Sending The second set is the main result of the paper in which
spam data and opening a connection to a specified IP address we compare the AUC scores of MMD-AE with AE and two
and port (combo). Each dataset is divided into a training set recent DTL models [18], [31]. All methods are trained using
(70% benign data samples and two random types of attacks) the training set including the source dataset with label infor-
and the testing set (30% benign data samples and the rest mation and the target dataset without label information. After
of attacks). Thus, many attack types are not included in the training, the trained models are evaluated using the target
training data. Each data sample has 115 attributes extracted dataset. The methods compared in this experiment include
from the packet stream. The number of training and testing the original AE (i.e., AE), and the DTL model using the
datasets is presented in Table 1.
1 TPR measures the proportion of actual positive samples that are correctly

B. EVALUATION METRIC identified.


2 FPR measures the ratio between the number of negative samples wrongly
To evaluate the effectiveness of the proposed model, categorized as positive samples (false positives) and the total number of
we use a popular performance metric, i.e., Area Under the actual negative samples.

107340 VOLUME 8, 2020


L. Vu et al.: DTL for IoT Attack Detection

KL metric at the bottleneck layer (i.e., SKL-AE), the DTL


method of using the MMD metric at the bottleneck layer
(i.e., SMD-AE), and our model (MMD-AE).
The third set is to measure the processing time of the
training and the predicting process of the above evaluated
methods. The detailed results of three experimental sets are
presented in the next section.

VI. RESULTS
This section presents the result of three sets of the experi-
ments in our paper.

A. EFFECTIVENESS OF TRANSFERRING
INFORMATION IN MMD-AE
FIGURE 6. Training and testing of AE, SKL-AE, SMD-AE, and MMD-AE
MMD-AE implements multiple transfer between encoding when the source domain is IoT-2 the target domain is IoT-1.
layers of AE1 and AE2 to force the latent representation of
AE2 closer to the latent representation of AE1 . In order to label information in the columns and the dataset without
evaluate if MMD-AE achieve its objective we conducted an information in the rows and tested on the dataset in the rows.
experiment in which, IoT-1 is selected as the source domain In this table, the result of MMD-AE is printed in bold face.
and IoT-2 is the target domain. We measured the MMD We can observe that AE is the worst method among the tested
distance between the latent representation, i.e., the bottleneck methods. Apparently, when an AE is trained on an IoT dataset
layer, of AE1 and AE2 when the transfer information is (the source) and evaluating on other IoT datasets (the target),
implemented in one, two and three layers of the encoders. its performance is not effective. The reason for this ineffective
The smaller distance is, the more information is transferred result is that the predicting data in the target domain is far
from the source domain (AE1 ) to the target domain (AE2 ). different from the training data in the source domain.
The result is presented in Fig. 5. Conversely, the results of three DTL models are much
better than that of AE. For example, if the source dataset
is IoT-1 and the target dataset is IoT-3, the AUC score is
improved from 0.600 to 0.745 and 0.764 with SKL-AE and
SMD-AE, respectively. These results prove that using DTL
helps to improve the accuracy of AEs on detecting IoT attacks
on the target domain.
More importantly, our proposed method, i.e., MMD-AE,
usually achieves the highest AUC score in almost all IoT
datasets.3 For example, the AUC score is 0.937 compared to
0.600, 0.745, 0.764 of AE, SKL-AE and SMD-AE, respec-
tively, when the source dataset is IoT-1 and the target dataset
is IoT-3. The results on the other datasets are also similar to
the results on IoT-3. These results demonstrate that imple-
menting the transferring task in multiple layers of MMD-AE
helps the model to transfer the label information from the
FIGURE 5. MMD of latent representations of the source (IoT-1) and the
target (IoT-2) when transferring task on one, two, and three encoding source to the target domain more effectively. Subsequently,
layers. MMD-AE often achieves better results compared to AE,
SKL-AE and SMD-AE in detecting IoT attacks in the target
The figure shows that transferring task implemented on
domain.
more layers results in the smaller MMD distance value.
In other words, more information can be transferred from
C. PROCESSING TIME ANALYSIS
the source to the target domain when the transferring task is
implemented on more encoding layers. This result evidences Fig. 6 shows the training and the predicting time of the
that our proposed solution, MMD-AE, is more effective than tested model when the source domain is IoT-2 and the target
the previous DTL models performing the transferring task domain is IoT-1.4 In this figure, the training time is measured
only at the bottleneck layer of AE. in hours and the predicting time is measured in seconds.
It can be seen that, the training process of the DTL methods
B. PERFORMANCE COMPARISON 3 The AUC scores of the proposed model in each scenario is presented by
Table 2 represents the AUC scores of AE, SKL-AE, SMD-AE the bold text style.
and MMD-AE when they are trained on the dataset with 4 The results on the other datasets are similar to this result.

VOLUME 8, 2020 107341


L. Vu et al.: DTL for IoT Attack Detection

TABLE 2. AUC scores of AE, SKL-AE, SMD-AE, and MMD-AE on nine IoT datasets.

(i.e., SKL-AE, SMD-AE, and MMD-AE) is more time con- model in ubiquitous IoT devices. Specifically, the labeled
suming than that of AE. One of the reason is that DTL models data and unlabeled data are fitted into two AE models with
need to evaluate the MMD distance between the AE1 and the same network structure. Moreover, the MMD metric is
AE2 at every iteration while this calculation is not required in used to transfer knowledge from the first AE to the second
AE. Moreover, the training time of MMD-AE is even much AE. Comparing to the previous DTL models, MMD-AE can
higher than those of SKL-AE and SMD-AE since MMD-AE operate at all the encoding layers instead of only the bottle-
needs to calculate the MMD distance between every encoding neck layer.
layers whereas SKL-AE and SMD-AE only calculate the We have carried out the extensive experiments to evaluate
distance metric in the bottleneck layer. the strength of our proposed model in many scenarios. The
However, it is important to note that the predicting time of experimental results demonstrate that DTL approaches can
all DTL methods is mostly equal to that of AE. The reason is enhance the AUC score for IoT attack detection. Further-
that the testing samples are only fitted to one AE in all tested more, our proposed DTL model, i.e., MMD-AE, operating
models. For example, the total of the predicting time of AE, transformation at all the level of encoding layers of the AEs
SKL-AE, SMD-AE, and MMD-AE are 1.001, 1.112, 1.110, helps to improve the effectiveness of the transferring process.
and 1.108 seconds, respectively, on 778, 810 testing samples Thus, the proposed model is meaningful when having label
of the IoT-1 dataset. information in the source domain but no label information in
the target domain.
VII. CONCLUSION One limitation of the proposed model is that it requires
In this paper, we have introduced a novel DTL-based more time to train the model. However, the predicting time
approach for IoT network attack detection, namely MMD- of MMD-AE is mostly similar to that of the other AE-based
AE. This proposed approach aims to address the problem models. In the future, one can extend our current work
of ‘‘lack of labeled information’’ for the training detection in several directions. First, we will distribute the training

107342 VOLUME 8, 2020


L. Vu et al.: DTL for IoT Attack Detection

process to the multiple IoT nodes by using the federated [20] J. Lu, V. Behbood, P. Hao, H. Zuo, S. Xue, and G. Zhang, ‘‘Transfer
learning technique to speed up this process. Second, the cur- learning using computational intelligence: A survey,’’ Knowl.-Based Syst.,
vol. 80, pp. 14–23, May 2015.
rent DTL model is developed based on AutoEncoder. In the [21] A. L. Buczak and E. Guven, ‘‘A survey of data mining and machine
future, we will attempt to extend this model based on other learning methods for cyber security intrusion detection,’’ IEEE Com-
neural networks such as Deep Adaptation Network (DAN), mun. Surveys Tuts., vol. 18, no. 2, pp. 1153–1176, 2nd Quart.,
2016.
Adversarial Discriminative Domain Adaptation (ADDA), [22] S. García, A. Zunino, and M. Campo, ‘‘Botnet behavior detection using
Maximum Classifier Discrepancy (MCD), and Conditional network synchronism,’’ in Privacy, Intrusion Detection and Response:
Domain Adversarial Network (CDAN) [43]. Technologies for Protecting Networks. Hershey, PA, USA: IGI Global,
2012, pp. 122–144.
[23] W. Zellinger, T. Grubinger, E. Lughofer, T. Natschläger, and S. Saminger-
REFERENCES Platz, ‘‘Central moment discrepancy (CMD) for domain-invariant rep-
[1] N. C. Luong, D. T. Hoang, P. Wang, D. Niyato, D. I. Kim, and Z. Han, resentation learning,’’ 2017, arXiv:1702.08811. [Online]. Available:
‘‘Data collection and wireless communication in Internet of Things (IoT) http://arxiv.org/abs/1702.08811
using economic analysis and pricing models: A survey,’’ IEEE Commun. [24] A. Nisioti, A. Mylonas, P. D. Yoo, and V. Katos, ‘‘From intrusion detec-
Surveys Tuts., vol. 18, no. 4, pp. 2546–2590, Jun. 2016. tion to attacker attribution: A comprehensive survey of unsupervised
methods,’’ IEEE Commun. Surveys Tuts., vol. 20, no. 4, pp. 3369–3388,
[2] Y. Meidan, M. Bohadana, A. Shabtai, M. Ochoa, N. O. Tippenhauer,
Jul. 2018.
J. Davis Guarnizo, and Y. Elovici, ‘‘Detection of unauthorized IoT devices
[25] Y. Xu, S. J. Pan, H. Xiong, Q. Wu, R. Luo, H. Min, and H. Song, ‘‘A unified
using machine learning techniques,’’ 2017, arXiv:1709.04647. [Online].
framework for metric transfer learning,’’ IEEE Trans. Knowl. Data Eng.,
Available: http://arxiv.org/abs/1709.04647
vol. 29, no. 6, pp. 1158–1171, Jun. 2017.
[3] Y. Meidan, M. Bohadana, Y. Mathov, Y. Mirsky, A. Shabtai,
[26] K. Weiss, T. M. Khoshgoftaar, and D. Wang, ‘‘A survey of transfer learn-
D. Breitenbacher, and Y. Elovici, ‘‘N-BaIoT—Network-based detection
ing,’’ J. Big Data, vol. 3, no. 1, May 2016.
of iot botnet attacks using deep autoencoders,’’ IEEE Pervasive Comput.,
[27] C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu, ‘‘A survey on deep
vol. 17, no. 3, pp. 12–22, Jul./Sep. 2018.
transfer learning,’’ in Proc. Int. Conf. Artif. Neural Netw. Rhodes, Greece:
[4] I. Ahmed, A. P. Saleel, B. Beheshti, Z. A. Khan, and I. Ahmad, ‘‘Security in
Springer, Oct. 2018, pp. 270–279.
the Internet of Things (IoT),’’ in Proc. 4th HCT Inf. Technol. Trends (ITT),
[28] S. Gou, Y. Wang, L. Jiao, J. Feng, and Y. Yao, ‘‘Distributed transfer network
Oct. 2017, pp. 84–90.
learning based intrusion detection,’’ in Proc. IEEE Int. Symp. Parallel
[5] N. Vlajic and D. Zhou, ‘‘IoT as a land of opportunity for DDoS hackers,’’
Distrib. Process. Appl., Aug. 2009, pp. 511–515.
Computer, vol. 51, no. 7, pp. 26–34, Jul. 2018.
[29] J. Zhao, S. Shetty, J. W. Pan, C. Kamhoua, and K. Kwiat, ‘‘Transfer
[6] C. Kolias, G. Kambourakis, A. Stavrou, and J. Voas, ‘‘DDoS in the IoT:
learning for detecting unknown network attacks,’’ EURASIP J. Inf. Secur.,
Mirai and other botnets,’’ Computer, vol. 50, no. 7, pp. 80–84, 2017.
vol. 2019, p. 1, Feb. 2019.
[7] R. Gow, F. A. Rabhi, and S. Venugopal, ‘‘Anomaly detection in complex
[30] I. T. Jolliffe, Principal Component Analysis. 2nd ed. New York, NY, USA:
real world application systems,’’ IEEE Trans. Netw. Service Manage.,
Springer-Verlag, 2002.
vol. 15, no. 1, pp. 83–96, Mar. 2018.
[31] F. Zhuang, X. Cheng, P. Luo, S. J. Pan, and Q. He, ‘‘Supervised represen-
[8] S. Khattak, N. R. Ramay, K. R. Khan, A. A. Syed, and S. A. Khayam, tation learning: Transfer learning with deep autoencoders,’’ in Proc. 24th
‘‘A taxonomy of botnet behavior, detection, and defense,’’ IEEE Commun. Int. Joint Conf. Artif. Intell., 2015, pp. 4119–4125.
Surveys Tuts., vol. 16, no. 2, pp. 898–924, 2nd Quart., 2014. [32] C. Kandaswamy, L. M. Silva, L. A. Alexandre, R. Sousa, J. M. Santos,
[9] J. Dromard, G. Roudiere, and P. Owezarski, ‘‘Online and scalable unsu- and J. M. de Sa, ‘‘Improving transfer learning accuracy by reusing stacked
pervised network anomaly detection method,’’ IEEE Trans. Netw. Service denoising autoencoders,’’ in Proc. IEEE Int. Conf. Syst., Man, Cybern.
Manage., vol. 14, no. 1, pp. 34–47, Mar. 2017. (SMC), Oct. 2014, pp. 1380–1387.
[10] H. Bahsi, S. Nomm, and F. B. La Torre, ‘‘Dimensionality reduction for [33] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning.
machine learning based IoT botnet detection,’’ in Proc. 15th Int. Conf. Cambridge, MA, USA: MIT Press, 2016. [Online]. Available:
Control, Autom., Robot. Vis. (ICARCV), Nov. 2018, pp. 1857–1862. http://www.deeplearningbook.org
[11] C. Zhang and R. C. Green II, ‘‘Communication security in Internet of [34] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, ‘‘Greedy layer-wise
Thing: Preventive measure and avoid DDoS attack over IoT network,’’ training of deep networks,’’ in Proc. Adv. Neural Inf. Process. Syst., 2007,
in Proc. 18th Symp. Commun. Netw., Alexandria, VA, USA, Apr. 2015, pp. 153–160.
pp. 8–15. [35] A. Gretton, K. Borgwardt, M. Rasch, B. Schölkopf, and A. J. Smola,
[12] C. Dietz, R. L. Castro, J. Steinberger, C. Wilczak, M. Antzek, A. Sperotto, ‘‘A Kernel method for the two-sample-problem,’’ in Proc. Adv. Neural Inf.
and A. Pras, ‘‘IoT-botnet detection and isolation by access routers,’’ in Process. Syst., 2007, pp. 513–520.
Proc. 9th Int. Conf. Netw. Future (NOF), Nov. 2018, pp. 88–95. [36] P. Yang, F. Luo, S. Wu, J. Xu, and D. Zhang, ‘‘Learning unsupervised word
[13] M. Nobakht, V. Sivaraman, and R. Boreli, ‘‘A host-based intrusion detec- mapping via maximum mean discrepancy,’’ in Proc. CCF Int. Conf. Natu-
tion and mitigation framework for smart home IoT using OpenFlow,’’ in ral Lang. Process. Chin. Comput. Dunhuang, China: Springer, Oct. 2019,
Proc. 11th Int. Conf. Availability, Rel. Secur. (ARES), Salzburg, Austria, pp. 290–302.
Aug. 2016, pp. 147–156. [37] S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, ‘‘Domain adaptation via
[14] J. Ceron, K. Steding-Jessen, C. Hoepers, L. Granville, and C. Margi, transfer component analysis,’’ IEEE Trans. Neural Netw., vol. 22, no. 2,
‘‘Improving IoT botnet investigation using an adaptive network layer,’’ pp. 199–210, Feb. 2011.
Sensors, vol. 19, no. 3, p. 727, Feb. 2019. [38] (2020). Tcptrace Tool for Analysis of TCP Dump Files. [Online]. Available:
[15] L. Vu, V. L. Cao, Q. U. Nguyen, D. N. Nguyen, D. T. Hoang, and http://www.tcptrace.org/
E. Dutkiewicz, ‘‘Learning latent distribution for distinguishing network [39] (2020). Wireshark Tool, the World’s Foremost and Widely-Used Network
traffic in intrusion detection system,’’ in Proc. IEEE Int. Conf. Commun. Protocol Analyzer. [Online]. Available: https://www.wireshark.org/
(ICC), May 2019, pp. 1–6. [40] D. J. Hand and R. J. Till, ‘‘A simple generalisation of the area under
[16] V. C. Loi, M. Nicolau, and J. McDermott, ‘‘Learning neural representations the ROC curve for multiple class classification problems,’’ Mach. Learn.,
for network anomaly detection,’’ IEEE Trans. Cybern., vol. 49, no. 8, vol. 45, no. 2, pp. 171–186, 2001.
pp. 3074–3087, Aug. 2019. [41] D. P. Kingma and J. Ba, ‘‘Adam: A method for stochastic
[17] O. Ibidunmoye, A.-R. Rezaie, and E. Elmroth, ‘‘Adaptive anomaly detec- optimization,’’ 2014, arXiv:1412.6980. [Online]. Available:
tion in performance metric streams,’’ IEEE Trans. Netw. Service Manage., http://arxiv.org/abs/1412.6980
vol. 15, no. 1, pp. 217–231, Mar. 2018. [42] J. Yang, R. Yan, and A. G. Hauptmann, ‘‘Cross-domain video concept
[18] L. Wen, L. Gao, and X. Li, ‘‘A new deep transfer learning based on sparse detection using adaptive svms,’’ in Proc. 15th Int. Conf. Multimedia (MUL-
auto-encoder for fault diagnosis,’’ IEEE Trans. Syst., Man, Cybern. Syst., TIMEDIA), 2007, pp. 188–197.
vol. 49, no. 1, pp. 136–144, Jan. 2019. [43] Y. Zhang, T. Liu, M. Long, and M. Jordan, ‘‘Bridging theory and algorithm
[19] S. Jialin Pan and Q. Yang, ‘‘A survey on transfer learning,’’ IEEE Trans. for domain adaptation,’’ in Proc. 36th Int. Conf. Mach. Learn., 2019,
Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010. pp. 7404–7413.

VOLUME 8, 2020 107343


L. Vu et al.: DTL for IoT Attack Detection

LY VU received the M.S. degree from Inha Univer- DINH THAI HOANG (Member, IEEE) received
sity, South Korea, in 2014. She is currently pursu- the Ph.D. degree in computer science and engi-
ing the Ph.D. degree in the major of mathematics neering from Nanyang Technological University,
theory for information technology with Le Quy Singapore, in 2016. He is currently a Faculty
Don Technical University, Vietnam. Her research Member of the School of Electrical and Data
interests include data mining, machine learning, Engineering, University of Technology Sydney,
deep learning, and network security. Australia. His research interests include emerging
topics in wireless communications and network-
ing such as ambient backscatter communica-
tions, vehicular communications, cybersecurity,
the IoT, and 5G networks. He is currently an Editor of the IEEE
QUANG UY NGUYEN received the Ph.D. degree WIRELESS COMMUNICATIONS LETTERS and the IEEE TRANSACTIONS ON COGNITIVE
from University College Dublin, Ireland, in 2011. COMMUNICATIONS AND NETWORKING. He was an Exemplary Reviewer of the
He is currently a Senior Lecturer with Le Quy IEEE TRANSACTIONS ON COMMUNICATIONS, in 2018, and the IEEE TRANSACTIONS
Don Technical University (LQDTU), where he is ON WIRELESS COMMUNICATIONS, in 2017 and 2018.
also the Director of the Machine Learning and
Applications Research Group. His research inter-
ests include machine learning, computer vision,
information security, evolutionary algorithms, and
genetic programming.

DIEP N. NGUYEN (Senior Member, IEEE)


received the M.E. degree in electrical and com-
puter engineering from the University of Cali-
fornia San Diego (UCSD) and the Ph.D. degree
in electrical and computer engineering from The
University of Arizona (UA). He is a Faculty Mem-
ber of the Faculty of Engineering and Information ERYK DUTKIEWICZ (Senior Member, IEEE)
Technology, University of Technology Sydney received the B.E. degree in electrical and electron-
(UTS). Before joining the UTS, he was a DECRA ics engineering and the M.Sc. degree in applied
Research Fellow of Macquarie University, and a mathematics from The University of Adelaide,
Member of Technical Staff at Broadcom, CA, USA, and ARCON Cor- in 1988 and 1992, respectively, and the Ph.D.
poration, Boston, consulting the Federal Administration of Aviation on degree in telecommunications from the University
turning detection of UAVs and aircraft, a U.S. Air Force Research Lab of Wollongong, in 1996. His industry experience
on anti-jamming. His current research interests include computer network- includes the management of the Wireless Research
ing, wireless communications, and machine learning applications, with an Laboratory, Motorola, in the early 2000s. He also
emphasis on systems’ performance and security/privacy. He has received holds a professorial appointment at Hokkaido Uni-
several awards from LG Electronics, the UCSD, the UA, the U.S. National versity, Japan. He is currently the Head of the School of Electrical and
Science Foundation, and the Australian Research Council. He is an Associate Data Engineering, University of Technology Sydney, Australia. His current
Editor of the IEEE TRANSACTIONS ON MOBILE COMPUTING and a Guest Editor research interests include 5G and the IoT networks.
of IEEE ACCESS.

107344 VOLUME 8, 2020

You might also like