0% found this document useful (0 votes)
23 views7 pages

LBDMIDS LSTM Based Deep Learning Model F

Uploaded by

xenaman17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views7 pages

LBDMIDS LSTM Based Deep Learning Model F

Uploaded by

xenaman17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

LBDMIDS: LSTM Based Deep Learning Model for

Intrusion Detection Systems for IoT Networks


Kumar Saurabh1 , Saksham Sood1 , P. Aditya Kumar1 , Uphar Singh1 , Ranjana Vyas1 , O.P. Vyas1 , Rahamatullah
Khondoker2
1 Indian Institute of Information Technology, Allahabad, India
2 Department of Business Informatics, THM University of Applied Sciences, Friedberg, Germany

[email protected], [email protected], [email protected], [email protected],


[email protected], [email protected], [email protected]
arXiv:2207.00424v1 [cs.CR] 23 Jun 2022

Abstract—In the recent years, we have witnessed a huge growth as they focus more on attack patterns but give high false
in the number of Internet of Things (IoT) and edge devices being positives.
used in our everyday activities. This demands the security of
these devices from cyber attacks to be improved to protect its Therefore, NIDS integrated with ML and DL techniques
users. For years, Machine Learning (ML) techniques have been have been developed to better identify new threats [3]. Earlier,
used to develop Network Intrusion Detection Systems (NIDS) ML and DL methods have been applied unanimously to
with the aim of increasing their reliability/robustness. Among the develop systems capable of detecting intrusion in the networks
earlier ML techniques DT performed well. In the recent years, for conventional and extensive communication. However, ev-
Deep Learning (DL) techniques have been used in an attempt
to build more reliable systems. In this paper, a Deep Learning ery method came with limitations of its own and gradually
enabled Long Short Term Memory (LSTM) Autoencoder and a the need for a better system increased as the classical models
13-feature Deep Neural Network (DNN) models were developed failed in terms of handling the heterogeneity of data. Also,
which performed a lot better in terms of accuracy on UNSW- the dataset used here should have multiple types of attack
NB15 and Bot-IoT datsets. Hence we proposed LBDMIDS, where vectors (like DoS, DDoS, Worms) to tackle multi-classification
we developed NIDS models based on variants of LSTMs namely,
stacked LSTM and bidirectional LSTM and validated their of attack categories. Traditional models are unable to detect
performance on the UNSW NB15 and BoTIoT datasets. This zero-day attacks as the dataset is unable to hold the entries
paper concludes that these variants in LBDMIDS outperform which are new to the attack analysis.
classic ML techniques and perform similarly to the DNN models So, a smart model is needed which can detect any anomaly
that have been suggested in the past.
which rises from any deviation from normal behavior of the
Index Terms—IoT Security, Intrusion, IDS, LSTM, Deep associated network. So, this hypotheses could also detect zero-
Learning day attack as the model will not solely depend on the pre-
built classes of attacks. However, this will again give rise to a
I. I NTRODUCTION problem which will bias towards the general class. This will
Internet of Things (IoT) is a collection of devices which lead to raised false positive rates. To get an intermediate model
gather and share data over the Internet. Data is gathered using which can act more sustainable, DL needs to be implemented.
sensors, which are embedded in these devices. IoT devices Classic ML techniques have been used in this field for 20+
are used in many areas, ranging from household devices like years while keeping the KDD99 dataset in consideration. But,
smart watches, smart bulbs, smart air conditioners, temperature with the onset of technological boom, the attack categories
sensors to more complex devices like smart vehicles and smart are increasing spontaneously. So, the potential of any classic
electrical grids. They are also extensively used in manufac- ML model is far smaller than the reach of the intrusion. So,
turing, transportation, infrastructure, military equipment and methods like SVM (Support Vector Machine), DT (Decision
healthcare. It was estimated that there are over 35 billions Tree), KNN (K-Nearest Neighbor), etc. have been ruled out
IoT devices would be in use upto 2021 [1]. This high number long before when considering for industrial applications. Also,
has contributed to increasing number of cyber attacks on IoT the hybrid method using these models has worked well for a
networks, which demands the security of these devices from long time, but eventually falls short in front of recent DL
cyber attacks to be increased as the current security measures models. The basis for working of an ML model is basically
have proven to be inadequate [2]. supervised and unsupervised learning techniques other than
Network Intrusion Detection Systems (NIDS) are used to the reinforcement learning part. ML extensively depends upon
detect cyber attacks and malicious activities like Denial of how rich the dataset is available to us. Also, the inability to
Service (DoS), Distributed DoS (DDoS), Worms, Backdoor, scale the data accordingly also poses a large limitation to the
etc. Network traffic is monitored and potential threats are extensive use of any ML model.
identified. Signature-based NIDS is good at detecting known To handle such issues, DL methods like ANN (Artificial
attacks but fail at detecting attacks which have not been seen Neural Networks) came into picture. Here the model is built
before. Anomaly-based NIDS is good at detecting new attacks on neurons which are coordinated by parameters and hyper-

978-1-6654-8453-4/22/$31.00 ©2022 IEEE


2

parameters. To scale the input and use it on extensive scale, accuracy of 63.97% and False Alarm Rates of 36.03%. In [7],
the number of layers are kept accordingly to get maximum the authors have addressed the issue of lack of a data set which
efficiency. DL methods have proved far better than the ML can appropriately show the modern network traffic and attacks.
models in terms of accuracy, precision, etc. with the ability So this paper looks into the creation of a UNSW-NB15 dataset.
to handle large amounts of data. ANN, CNN (Convolutional It has 49 features developed using Argus and Bro-IDS tools.
Neural Networks), RNN (Recurrent Neural Networks), FDN In paper [8], the authors proposed a hybrid IDS for IoT net-
(Feed Forward Deep Networks) are many examples of DL works by combining CNNs (Convolutional Neural Network)
architectures. DL techniques in general have outperformed ML and LSTM. They used UNSW NB-15 dataset and compared
techniques. the performances of the hybrid IDS and RNN for binary
As research progressed, it was seen that basic DL archi- classification, and the accuracy achieves was 95.7% and 98.7%
tectures lacked the ability to detect unknown attacks and respectively. In [9], the authors compared the performances of
even if they did, the false positives and false negatives were DL methods like RNN, GRU and Text-CNN with some of the
very high. To tackle this problem, LSTM (Long Short-Term traditional ML methods on the KD99 and ADFA-LD datasets.
Memory) was introduced. This methodology is a special form The DL methods were able to outperform the traditional ML
of RNN. The most distinguishing feature of LSTM is to keep techniques. In paper [10], the authors implemented Linear
the information/parameter for later use in the system. Thus, Discriminant Analysis (LDA), Classification and Regression
they can handle the data which is time series and could be Tree (CART) and Random Forest (RF) on KD99 dataset. The
variable with respect to time. accuracy achieved were 98.1%, 98% and 99.65% respectively.
The primary motivation of using LSTM lies in its approach In [11], the authors implemented a Feed-Forward Neural
where is is not restricted to the limitations of conventional Network for binary and multi-class classification on the Bot-
DL (neural networks) methods. Here, the input sequences and IoT dataset for normal and three different attack categories
output sequences between layers are variable which could ac- (DDoS/DoS, Reconnaissance, Information Theft). They were
cordingly work efficiently to detect known as well as unknown able to achieve 98%, 99.4%, 98.4% and 88.9% accuracy on
attacks. the normal and the three attack categories respectively. In
The paper is organized as follows: Section II contains paper [12], the authors proposed an IDS based on blockchain
related research which demonstrates the earlier work done, and deep learning. The DNN model was able to achieve 98%
Section III describes the datasets used in our experiments, accuracy for binary classification and 97% accuracy for multi-
Section IV includes our proposed methodology, Section V class classification on the NSL-KDD dataset.
contains the results and performance analysis, and Section VI
gives the conclusion and future works. III. DATASET E XPLANATION
In order to train our models and to determine their reliability
II. L ITERATURE R EVIEW
in testing phase, Data selection is necessary. In the past
In paper [4], IDS was developed to learn the behavior datasets like KDD98, KDDCUP99 and NSLKDD were used
of normal network traffic. Dataset used was UNSW-NB 15 as comprehensive datasets for Network Intrusion Detection
for communication of external networks. The methods being System(NIDS). In the recent times the researches have shown
experimented here were Artificial Immune System (AIS), that the these datasets don’t reflect the modern network traffic
Filtered-based SVM (FSVM), Euclidean distance Map (EDM), (normal and attack vectors). Hence, we selected a few datasets
and Geometric Area Analysis (GAA) which performed 85%, that were made publicly available by researchers in the last few
92%, 90% and 93% respectively. In [5], BoT-IoT and UNSW- years. These datasets contained labelled network data that were
NB15 datasets are considered in the experiment. After the data generated in labs with the help of virtual network setup. The
standardization is done, the MLP (Multi-Layered Perceptron) datasets are a hybrid of Normal network traffic and synthetic
model is used followed by adapting the hyperparamters. The botnet attack traffic. The datasets selected are:
comparison was based on the BoT-IoT dataset while the
classifier was built on UNSW-NB15 dataset. ARM (Associ- A. UNSW-NB15 Dataset
ation Rule Mining) and Naı̈ve Bayes only had 85% and 72%
Widely in use since 2015 as a dataset for evaluating NIDS, it
accuracy respectively. The normal perceptron model was basic
was created by Cyber Range Lab of UNSW-Canberra [14-17].
and gave 63% accuracy but when converted to the prima facie
The Dataset contains 49 (excluding class labels) features and a
13- feature DNN model, it gave 99% accuracy. Among the ML
total 2,540,044 records that were split across 4 CSV files. The
techniques, Decision tree gave the highest accuracy (93%).
dataset contains a total of 10 class labels out of which there
The process of training the models consumes a lot of time
are 9 types of Botnet attack vectors and 1 Normal class. The
and memory [5]. In paper [6], the authors explore ML based
different types of attack traffic are as follows: analysis, fuzzers,
approach. On carrying out an experiment it was seen that
backdoor, generic, Denial of Service (DoS), reconnaissance,
the ML methods along with flow identifiers were effective
shellcode, exploits and worms [13]. Out of the 49 features 13
in detecting botnet attacks. Four ML algorithms were used
important ones were selected in the paper [4], namely source
namely, ARM, Artificial Neural Network (ANN), Naı̈ve Bayes
ip address, source port, destination ip address, destination
and Decision Tree. Decision Tree gave the best results with
port, duration, source bytes, destination bytes, source TTL,
the highest accuracy of 93.23% and the lowest False Alarm
destination TTL, Source load, Destination load, Source packets
Rate of 6.77% whereas ANN was the least accurate with an
and Destination packets.
3

Fig. 1. Proposed LBDMIDS architecture

B. Bot-Iot Dataset 2) Recurrent Neural Networks: RNN is a type of network


Created by the Cyber Range Lab of UNSW-Canberra in normally distinguished by “memory”. The layers present in
2019, it includes network and botnet traffic incorporated into the model usually take account of the previous inputs while
a network environment. The Dataset contains 47 features considering the input of a certain layer. While traditional
including class label, attack category and attack subcategory neural networks work on the principle that input and output
[18]. A total of 72 million records of Normal and attack traffic are generated independently, RNN works on the method of
constitute the Dataset. A pre-selected training and testing feeding the output back into the system which generally works
dataset which includes 10-best features was created which in favor of reducing the error and balancing the model.
have been used for training and validation purposes. The train- Another variation which makes RNN different from FFNN
ing and testing dataset have a combined of 3.6 million records. (Feed Forward Neural Networks) is the weights and parame-
The Dataset includes attack vectors of types DDoS, DoS, OS ters associated with the layers of network. Each node in FFNN
and service scan, keylogging and Data exfiltration attacks. On uses different weights which are adjusted through the pro-
the basis of protocol used DDoS and Dos categories are further cesses of gradient descent and back-propagation. In RNN, the
divided into different types. parameters are shared between the layers of network, although
the parameters are still adjusted by the same processes used
IV. P ROPOSED M ETHODOLOGY in FFNN (gradient descent and back-propagation).
A. Working Principles While taking the “feedbacks” from the succeeding layers,
Let us understand the working principles behind our project. RNN is able to adjust its parameters by analysing the errors
We are using two variants of LSTM in our research, Stacked encountered by the model. This works well in maintaining the
and Bidirectional. First, let us see Feed Forward Neural error percentage to a threshold and also fits the model better.
Networks and Recurrent Neural Networks. Overcoming the problem faced by FFNN, RNN faces two
1) Feed Forward Neural Networks: This model consists other limitations. These are known as vanishing gradient and
of input layer and output layer with several hidden layers in exploding gradient problem. These problems generally depend
between. This model is generally designed to contain three or upon the function of error in the model. Here, gradient is
more hidden layers. The layers consist of biases and weights. defined as the slope of the error function. If the slope becomes
The input data is injected into the first layer and consequently too low, then it will continue to decrease as the threshold keeps
the output is generated which becomes the input for the next decreasing. This will give rise to non-learning of the model.
layer. The number of nodes are adjusted so that they match The parameters in the model will become negligible and the
the number of features in input data. The weights and biases memory of model will saturate. On the other hand, when slope
used in the layers are initialised randomly but later optimised becomes too high, the model will become unstable and the data
through a method known as back-propagation. fed to the model will generate fluctuating outputs which will
Important note here is that the model is “one-way” and there in turn render the model parameters too large to be considered.
are no backward connections here. The data computation flows 3) Long Short-Term Memory: LSTMs are a special variant
in one direction where any input generates an output where of RNN. These are particularly used in learning long-term
there is no connection such that the output could be injected dependencies. RNNs fail in learning information when the
back to the model. The absence of backward connections give dataset is large and the entries have much similarity. So, LSTM
rise to limitations such as the ”error-sum” problem where was designed to handle information for longer feeds of input
the error generated in one layer keeps on increasing in the and work for larger epochs.
consequent layers. As there is no option of taking the feedback As the layers of RNN have a very simple structure, for
from the output, the input and output become independent example a single Tanh layer [19], remembering information
which makes the tuning of hyper-parameters by the model an for a long period of time becomes a struggling task for them.
un-achievable task. On the other hand, in a general LSTM model, there are four
4

interacting layers which work in a special way to remember For Bot-IoT dataset, the 10 features that were pre-selected
the important information for a long duration of time. in the Reduced training and testing set were:
The problem of vanishing gradient faced by RNN is also 1) rate - Total packets per second in transaction
solved by LSTM as they continue to learn new information 2) srate - Source-to-destination packets per second
while keeping the previous ones. Hence the significance of 3) drate - Destination-to-source packets per second
parameters is maintained and the model becomes stable. 4) min - Minimum duration of aggregated records
LSTMs also have different variations. Although the dif- 5) max - Maximum duration of aggregated records
ference is pretty small between those, they could be of 6) mean - Average duration of aggregated records
great significance depending upon the input data fed to the 7) std dev - Standard deviation of aggregated records
network. Common types of LSTM are Classic, Stacked and 8) state number - Numerical representation of feature state
Bidirectional. Stacked LSTM has several LSTM layers and 9) flgs number - Numerical representation of feature flags
can only access past samples. Bi-directional LSTM has two 10) seq - Argus sequence number
layers and has access to both past and future samples. 2) z-scale Normalization: The process of normalization is
done to transform the data in a way they are distributed
B. Network Intrusion Detection System(NIDS) Architecture similarly. It helps the model to treat each feature with similar
We proposed NIDS based on two variants of LSTM - importance as it provides similar weights to each feature.
Stacked LSTM and Bi-directional LSTM to help filter out Assuming the feature subspace has N rows and M columns
normal and attack vectors in IoT network traffic. The workflow i.e., X (feature subspace) = RN ×M , the z-scale normalization
of the NIDS architecture can be divided into three phases, can be implemented as follows:
namely the data Preprocessing, Training of the Model and PN −1
Model Validation as shown in Fig.1. xim
µm = i=0 (1)
N
PN −1
C. Data Preprocessing (xim − µm )2
σm = i=0 (2)
In Data Preprocessing stage, the raw data extracted from N
IoT networks in form of PCAP/CSV files is processed in a z-scale normalized feature vector can be obtained as
suitable form to be fed to the LSTM Model. The data is
filtered to remove any redundancies and get rid of Null values. Xm − µ m
zm = (3)
Afterwards the most important features are selected to be fed σm
into the proposed model, which is then followed by z-score Here, µm is the mean of the entries of the m’th column and
Normalization of feature which ensures similar distributions σm is the standard deviation of the entries of the m’th column.
for each feature.
1) Feature Extraction: We managed to reduce the Dimen- Algorithm 1 Z-SCALE NORMALIZATION
sionality of the raw data by selecting the most important
features which made the data suitable for processing. Very for each col m in X(0,1,2...M-1)
often, large Datasets have a lot of redundant and correlated µm ← compute(1)
data that can be filtered out without losing important or σm ← compute(2)
relevant information. In case of USNW NB15, we selected zm ← compute(3)
the following most important 13 features [1]: end for.
1) Source ip address - IP address of the attacker computer
2) Source port - Port number of the attacker computer
3) Destination ip address - IP address of the victim com- D. Training and Validation Process
puter Data Processing stage transforms the data into a more
4) Destination port - Port number of the victim computer suitable form to be processed by the model which will lead to
5) Duration - Record total duration of transaction more accurate predictions. It is followed by changing the shape
6) Source bytes - Number of bytes sent from source to the of the data to be processed by the LSTM layers. A suitable
destination Timesteps parameter is selected and the dimensionality of the
7) Destination bytes - Number of bytes sent from destina- dataset is changed(samples, timesteps, features). Timesteps
tion to the source is the number of past samples on which the LSTM model
8) Source TTL - Source to destination time to live value looks back at. The training and validation period of the model
9) Destination TTL - Destination to source time to live consists of feeding the time-series sequential data to the LSTM
value layers.
10) Source load - Transmission rate in bits per second 1) Stacked LSTM: In case of stacked LSTM, there are
11) Destination load - Reception rate in bits per second multiple layers of LSTM stacked on top of one another. The
12) Source packets - Number of packets sent from source to LSTM layers help in uncovering the patterns and dependence
the destination of features to their class labels because they can learn at higher
13) Destination packets - Number of packets sent from levels of abstractions. The input LSTM layer is followed by a
destination to the source batch of hidden LSTM layers that process sequenced input and
5

combine learning patterns from the previous layers, to produce • True Positive (TP): Number of correctly predicted attack
learning representations at higher levels of abstraction. The samples.
Dense layer is the final layer with the number of nodes equal • False Positive (FP): Number of falsely predicted attack
to the number of categories in the output label. In the decision samples.
phase, the soft-max activation function[19] is used by the • True Negative (TN): Number of correctly predicted
dense layer to select the most probable of output classes, and normal samples.
the prediction error is calculated with the help of ’Sparse Cat- • False Negative (FN): Number of falsely predicted normal
egorical Crossentropy’ which is then backpropogated to adjust samples.
the weights of the neural network. The model hyperparameters • Accuracy: Ratio of correctly predicted samples to total
are shown in table 1. samples.
TABLE I
Model hyper-parameters for Stacked LSTM TP + TN
ACC = (4)
No. of LSTM No. of Learning
TP + TN + FP + FN
Dataset
Layers Cells/Layer Epochs Rate
40 128 128 • Precision: Ratio of correctly predicted attack samples to
UNSW-NB15 4 50 0.002
64 total predicted attack samples.
Bot-IoT 2 32 32 5 0.002
TP
PR = (5)
2) Bi-Directional LSTM: In case of Bi-Directional LSTMs, TP + FP
the recurrent network layer is replicated and it works along
• Recall: Ratio of correctly predicted attack samples to
side the first layer. The first layer processes the input sequence,
total number of attack samples
while the reversed copy of the input sequence is fed to the
second layer. Learning from the past instances and the future TP
instances provides more context to the network and results RE = (6)
TP + FN
in better learning. The input layer is a Bi-Directional LSTM
layer which feeds forward a non-sequential output to a Dense • F1 Score: Weighted mean of precision and recall.
Layer. The Dense layer is the final layer with the number
of nodes equal to the number of categories in the output 2 ∗ Recall ∗ P recision
F 1Score = (7)
label. In the decision phase the soft-max activation function Recall + P recision
is used by the dense layer to select the most probable of
output classes, and the prediction error is calculated with • Weighted avg: The data points which have higher fre-
the help of ’Sparse Categorical Crossentropy’ which is then quency contribute more than others.
backpropogated to adjust the weights of the neural network.
The model hyperparameters are shown in table 2. TABLE III
TABLE II Stacked LSTM
Model hyper-parameters for Bi-LSTM
Attack Precision Recall F1
No.s of LSTM No.s of Learning Normal 0.98 1.00 0.99
Dataset Exploits 0.56 0.86 0.67
Layers Cells/Layer Epochs Rate
UNSW-NB15 1 64 50 0.0015 Reconnaissance 0.79 0.51 0.62
Bot-IoT 1 12 5 0.001 DoS 0.87 0.01 0.01
Generic 1.00 0.98 0.99
Shellcode 0.62 0.41 0.49
3) Model Structure and Parameters: The model is trained Fuzzers 0.52 0.29 0.37
Worms 1.00 0.00 0.00
over a number of epochs and training and validation loss Backdoor 1.00 0.00 0.00
decreases gradually with time. The learning process is stopped Analysis 1.00 0.00 0.00
when the number of epochs cross the maximum limit or the Weighted avg 0.97 0.96 0.96
model starts overfitting on the training dataset.
V. RESULTS AND PERFORMANCE ANALYSIS
A. Experimental Setup C. Performance Analysis on UNSW NB-15 Dataset:
We have used the Google Colab’s GPU. The specifications The dataset was splitted into 75% for training and 25%
were Intel(R) Xeon(R) CPU with 2 [email protected] GHz, 12.7 for validation purpose. The training phase consisted of 50
GB of RAM and 78 Gb of Hard Disk space. The version of epochs that lasted over 5 hours. In case of Stacked LSTM,
python installed was 3.7.12 and Tensorflow was 2.7.0. the time taken for the validation phase was 128 seconds with
the processing speed of 0.2 ms/sample and for Bidirectional
B. Evaluation Metrics LSTM, the validation phase took 90 seconds with the pro-
There is no single metric which can accurately tell how good cessing speed of 0.14 ms/sample. The accuracy achieved by
a particular model is. Hence, we have used several metrics to Stacked LSTM was 96.60% and the accuracy achieved by Bi-
evaluate the DL models: directional LSTM was 96.41%
6

TABLE IV
Bi-directional LSTM

Attack Precision Recall F1


Normal 0.99 0.99 0.99
Exploits 0.54 0.79 0.64
Reconnaissance 0.53 0.59 0.56
DoS 0.47 0.03 0.06
Generic 1.00 0.98 0.99
Shellcode 0.68 0.59 0.64
Fuzzers 0.56 0.35 0.43
Worms 0.53 0.16 0.25
Backdoor 0.67 0.00 0.01
Analysis 1.00 0.00 0.00
Weighted avg 0.96 0.96 0.96

Fig. 4. Training vs. validation accuracy (BI-LSTM)

D. Performance Analysis on Bot-IoT Dataset:


The accuracy achieved by Stacked LSTM was 99.99% and
In case of Bot IoT, the training dataset had 2,934,817 the accuracy achieved by Bi-directional LSTM was 99.99%.
samples and testing dataset had 733,705 samples. The time
taken by the training phase was 20 minutes which consisted
of 5 epochs for both the stacked and bi-directional LSTM
models. In case of Stacked LSTM, the validation phase lasted
for 48 seconds with the processing speed of 0.06 ms/sample.
For Bidirectional LSTM, the model took 156 seconds with the
processing speed of 0.195 ms/sample. .

Fig. 5. Training vs.Validation loss (BI-LSTM)

TABLE V
Stacked LSTM & Bi-directional LSTM

Attack Precision Recall F1 Precision Recall F1


Normal 0.98 0.83 0.89 1.00 1.00 1.00
Fig. 2. Training vs.Validation accuracy (STACKED LSTM) DDoS 1.00 1.00 1.00 1.00 1.00 1.00
DoS 1.00 1.00 1.00 1.00 0.79 0.88
Reconnaissance 1.00 0.93 0.96 1.00 0.93 0.96
Theft 1.00 0.36 0.53 1.00 0.36 0.53
Weighted avg 1.00 1.00 1.00 1.00 1.00 1.00

E. Comparison with other techniques:

TABLE VI
Comparison of Results on BoT-IoT and UNSW NB-15 Datasets

BoT-IoT UNSW NB-15


Method Accuracy (%) Method Accuracy (%)
ARM [5] 85.6 FSVM [4] 92
Decision Tree [5] 93.2 GAA [4] 93
DNN [5] 99.9 DNN [5] 99.2
Naive Bayes [5] 72.7 ANN [6] 63.97
Perceptron [5] 63.9 ARM [6] 86.45
Fig. 3. Training vs. Validation loss (STACKED LSTM) - - RNN [8] 95.7
- - Hybrid [8] 98.7
LBDMIDS 99.9 LBDMIDS 96.6
7

[3] A. Javaid, Q. Niyaz, W. Sun, M. Alam, A deep learning approach for


network intrusion detection system, in: Proceedings of the 9th EAI In-
ternational Conference on Bioinspired Information and Communications
Technologies (formerly BIONETICS), 2016, pp. 21–26.
[4] J. Ashraf, A. D. Bakhshi, N. Moustafa, H. Khurshid, A. Javed and A. Be-
heshti, ”Novel Deep Learning-Enabled LSTM Autoencoder Architecture
for Discovering Anomalous Events From Intelligent Transportation Sys-
tems,” in IEEE Transactions on Intelligent Transportation Systems, vol.
22, no. 7, pp. 4507-4518, July 2021, doi: 10.1109/TITS.2020.3017882.
[5] Koroniotis, Nickolaos & Moustafa, Nour & Sitnikova, Elena. (2020). A
new network forensic framework based on deep learning for Internet
of Things networks: A particle deep framework. Future Generation
Computer Systems. 110. 10.1016/j.future.2020.03.042.
[6] Koroniotis, Nickolaos & Moustafa, Nour & Sitnikova, Elena & Slay,
Jill. (2017). Towards Developing Network forensic mechanism for Botnet
Activities in the IoT based on Machine Learning Techniques.
[7] N. Moustafa and J. Slay, ”UNSW-NB15: a comprehensive data set for
network intrusion detection systems (UNSW-NB15 network data set),”
2015 Military Communications and Information Systems Conference
(MilCIS), 2015, pp. 1-6, doi: 10.1109/MilCIS.2015.7348942.
Fig. 6. NIDS accuracy comparision of Bi-LSTM and Stacked LSTM on both [8] Smys, S., Abul Basar, and Haoxiang Wang. ”Hybrid intrusion detection
datasets system for internet of things (IoT).” Journal of ISMAC 2.04 (2020): 190-
199.
[9] Zhong, Ming, Yajin Zhou, and Gang Chen. ”Sequential model based
intrusion detection system for IoT servers using deep learning methods.”
Sensors 21.4 (2021): 1113.
The results in paper [4], [6] and [8] are for binary classi- [10] Saranya, T., et al. ”Performance analysis of machine learning algorithms
fication (Attack and Benign) while the results in paper [5] in intrusion detection system: A review.” Procedia Computer Science 171
and proposed methodology (LBDMIDS) is for Multi-Class (2020): 1251-1260.
[11] M. Ge, X. Fu, N. Syed, Z. Baig, G. Teo and A. Robles-Kelly, ”Deep
classification. Learning-Based Intrusion Detection for IoT Networks,” 2019 IEEE
24th Pacific Rim International Symposium on Dependable Computing
(PRDC), 2019, pp. 256-25609, doi: 10.1109/PRDC47002.2019.00056.
VI. C ONCLUSION AND F UTURE W ORKS [12] Liang, Chao, et al. ”Intrusion detection system for Internet of Things
based on a machine learning approach.” 2019 International Conference
To protect the IoT networks from attackers and their attacks, on Vision Towards Emerging Trends in Communication and Networking
it is necessary to detect the intrusions precisely. In this paper, a (ViTECoN). IEEE, 2019.
[13] N. Moustafa and J. Slay, ”UNSW-NB15: a comprehensive data set for
DL method based model called LBDMIDS has been proposed network intrusion detection systems (UNSW-NB15 network data set),”
which shows promising performance in intrusion detection. 2015 Military Communications and Information Systems Conference
In this paper, the focus was on detection of malicious events (MilCIS), 2015, pp. 1-6, doi: 10.1109/MilCIS.2015.7348942.
[14] Moustafa, Nour, and Jill Slay. ”The evaluation of Network Anomaly
with improved accuracy in IoT networks where the dataset is Detection Systems: Statistical analysis of the UNSW-NB15 dataset and
large and time series. LBDMIDS works on LSTM architecture the comparison with the KDD99 dataset.” Information Security Journal:
to detect intrusion in a network. To validate LBDMIDS, the A Global Perspective (2016): 1-14.
[15] Moustafa, Nour, et al. ”Novel geometric area analysis technique for
experiment was performed on two well known datasets, i.e., anomaly detection using trapezoidal area estimation on large-scale net-
BoT-IoT and UNSW-NB15. We scaled and normalized the works.” IEEE Transactions on Big Data (2017).
dataset accordingly and fed it to LBDMIDS. The output [16] Moustafa, Nour, et al. ”Big data analytics for intrusion detection system:
statistical decision-making using finite dirichlet mixture models.” Data
and results produced by LBDMIDS are good in terms of Analytics and Decision Support for Cybersecurity. Springer, Cham, 2017.
prediction accuracy and F1-score. To generalize our model 127-156.
in LBDMIDS, Stacked and Bidirectional LSTM were used [17] Sarhan, Mohanad, Siamak Layeghy, Nour Moustafa, and Marius Port-
mann. NetFlow Datasets for Machine Learning-Based Network Intrusion
on UNSW-NB15 dataset, the accuracy achieved by Stacked Detection Systems. In Big Data Technologies and Applications: 10th
LSTM model was 96.60% and the accuracy achieved by Bi- EAI International Conference, BDTA 2020, and 13th EAI International
Directional LSTM model was 96.41%. Similarly, on BoT-IoT Conference on Wireless Internet, WiCON 2020, Virtual Event, December
11, 2020, Proceedings (p. 117). Springer Nature.
dataset, the accuracy achieved by Stacked and Bi-Directional [18] Koroniotis, Nickolaos & Moustafa, Nour & Sitnikova, Elena & Turnbull,
LSTM model was 99.99%. Benjamin. (2018). Towards the Development of Realistic Botnet Dataset
Owing to the system limitations and epoch duration, the in the Internet of Things for Network Forensic Analytics: Bot-IoT Dataset.
[19] Sagar Sharma, Sep 6, 2017, Activation Functions in Neural Net-
accuracy could be significantly improved considering more works. URL https://towardsdatascience.com/activation-functions-neural-
efficient and robust machines. Our experiment was not per- networks-1cbd9f8d91d6
formed on industrial scale or real world IoT applications [20]. [20] K. Saurabh, Vikash, L. Mishra and S. Varma, ”An Efficient IoT Model
for On-Demand Particulate Matter Control System in Coal Mining Cities,”
With more efficient GPUs and more hybrid DL models, the 2020 IEEE 17th India Council International Conference (INDICON),
prediction accuracy could be improved. 2020, pp. 1-7, doi: 10.1109/INDICON49873.2020.9342085

R EFERENCES
[1] Jack Steward, The Ultimate List of Internet of Things Statistics for 2022.
URL https://findstack.com/internet-of-things-statistics/
[2] M. Nawir, A. Amir, N. Yaakob, O.B. Lynn, Internet of things(iot):
Taxonomy of security attacks, in: 2016 3rd International Conference on
Electronic Design (ICED), IEEE, 2016, pp. 321–326.

You might also like