Convolutional Neural Networks for Multi-class Intrusion Detection System-2018
Convolutional Neural Networks for Multi-class Intrusion Detection System-2018
1 Introduction
IDS are mainly classified into two types: host-based IDS (HIDS) and network-
based IDS (NIDS). HIDS in general a piece of software resides on the host system or
infrastructure and looks for and suspicious activities occurring at host or being invaded
on the host. NIDS can be either a software or a dedicated hardware which tracks the
network packets in real time or close to real time and tries to identify the malicious
patterns in the network traffic [2]. Based on the needs of the individual organizations
and available resources, the type of IDS is deployed in the network infrastructure for
intrusion detection.
Key parameters such as performance requirement, reliability requirements, operat-
ing systems and applications, risk management goals, security architecture etc. differ-
entiate the use of intrusion detection mechanisms from IT infrastructure to ICS
infrastructure. A detailed information about the priorities and their relation to security
parameters in mentioned in [3]. Due to these reasons and continuous novel attacks on
ICS highlights the importance and need of research for developing IDS mechanisms to
improve the defense in depth strategies. [4] also gives a detailed overview on the
challenges and different scientific works in improving the security in the context of ICS.
Many IDS mechanisms exist and also uses deep learning techniques such as
Stacked Autoencoders (SAE) and Deep Belief Networks (DBN). But very few
researchers concentrated on the multiple attack type classification. In order to counter
fight with the malicious attacks, a good knowledge on the type of attacks is necessary.
This paper mainly concentrates on identifying different network attacks on ICS that
mainly impact the security parameters availability and confidentiality using Convolu-
tional Neural Network (CNN) which was underseen by many researchers in this
domain.
The rest of the paper is structured as follows. Section 2 will discuss about the
relevant literature related to the development of IDS using different deep learning
techniques and their outcomes and drawbacks. In Sect. 3, an overview about the used
datasets for evaluating the performance of the mentioned approach is discussed.
Section 4 proposes the CNN based IDS architecture with detailed discussion about the
pre-processing, training and testing. Section 5 discuss about the implementations and
obtained results by comparing accuracy of intrusion detection with different perfor-
mance metrics. Finally, Sect. 6 concludes the paper and proposes the future work needs
to be done.
2 Related Work
With DARPA Intrusion Detection Evaluation released in 1998 and 1999 in conjunction
with the MIT [5], the concept of intrusion detection and development of security
mechanisms in communication systems came into main stream of research. Since then,
several researchers developed intrusion detection strategies using different existing
datasets like KDD, NSL-KDD etc. to evaluate the performance of the developed IDS.
A detailed analysis on different datasets of intrusion detection is mentioned in [6]. The
drawback of existing datasets and the need for development of NSL-KDD dataset was
discussed in [7]. Despite being old, NSL-KDD dataset was used to evaluate the per-
formance of the proposed mechanism. As many researchers used the same dataset use
Convolutional Neural Networks for Multi-class IDS 227
of same dataset makes our approach comparable with existing approaches. Due to this
reason, the related work section also deals with the literature who used NSL-KDD
dataset for the development of IDS.
Deep learning techniques also come under the subcategory of machine learning
algorithms. But discussing about every machine learning algorithm used for devel-
opment of IDS is not possible. A detailed analysis of NSL-KDD data using various
machine learning techniques with Waikato Environment for Knowledge Analysis
(WEKA) tool is discussed in [8]. Different deep learning techniques for IDS is dis-
cussed here.
Deep learning based studies show that it completely surpasses the traditional
methods in intrusion detection. In [9], deep neural networks for flow based anomaly
detection was proposed and proves that deep learning techniques can be used for
anomaly detection in software defined networks. [10] uses deep learning with self-
taught learning technique and benchmarks the performance using NSL-KDD dataset
for network intrusion detection. Here deep learning is used to classify the normal and
attack classes. Performance evaluation for multiclass classification was not performed.
In [11], Recurrent Neural Networks (RNN) are considered as a reduced-size net-
works. They classify the multiple attack classes and the performance looks promising.
But, the dataset used for training is not complete NSL-KDD dataset, they used a part of
the training dataset which makes the performance biased. They also concentrated
mainly on feature grouping rather than attack classification. Unfolded RNN were used
in [12], and also used the limited training dataset of NSL-KDD dataset for training
against attacks. When compared to existing machine learning approaches, the detection
accuracies are higher with RNN. DBN for IDS was propose by [13] and explained the
efficiency of achieving higher accuracy. They performed the training operation with
20%. 30% and 40% of the NSL-KDD train dataset and tested it with the same.
Overcoming the above mentioned drawbacks, [14] uses SAE for deep feature
extraction and multiclass attack classification. The results look promising and much
better than the existing approaches. To overcome the drawback of long training time
[15] mentioned the use of accelerated computing platform techniques to train the deep
neural networks faster along with multi-class attack classification. In [16], the use of
hybrid deep learning techniques a combination of deep learning and machine learning
techniques were discussed. For better classification a combination of multiple detection
mechanisms with ranking approach for highest detection accuracy of the individual
attack classes was proposed.
Recently [17], provided a detailed multiclass class classification of NSL-KDD
datasets using the DBN and SAE. They outperformed the detection accuracy when
compared to other approaches by proposing the nonsymmetric deep autoencoder. They
also performed a more detailed 13-class multi-class classification to evaluate the per-
formance of their proposed approach and looks promising. Despite the results looks
promising, they used the same training dataset to test and evaluate the performance of
the proposed approach which leads to achieve higher detection accuracies.
As CNN are mainly performed on images, only one related work using CNN for
development of IDS was found. We also used this approach as a basis for our
implementation. [18] provided an effective image conversion method of NSL-KDD
data set. The numerical features in NSL-KDD are normalized using min-max
228 S. Potluri et al.
normalization and then different binary values are assigned to the different features of
NSL-KDD data. This assigned binary values are converted to an image for training and
testing of the CNN. This approach converts all the NSL-KDD features into image
format. Even though they performed very structured pre-processing, the performance of
IDS was analyzed using available ResNet50 and GoogLeNet architectures which are
famous for image processing applications. The accuracies were not satisfactory and
discussion on multi-class classification fails which led us to investigate further on the
performance on CNN for multi-class classification. Another research on CNN based
IDS was mentioned in [19] used 10% KDDcup 99 dataset. Despite getting better
accuracies they just used 10% dataset so this research is not considered in our
benchmarking.
Some common drawback from existing approaches are listed. Many approaches use
the same training dataset of NSL-KDD for training and testing and it shows better
detection accuracies. This is not accurate as the feature for normal and attack in NSL-
KDD dataset are different in training and testing dataset. Only selected part of the
training and testing dataset are used to evaluate the developed mechanism which will
result in biased outputs. Very few works concentrate on the multiclass attack classi-
fication and the use of CNN for IDS is not familiar.
3 Datasets
Different datasets exists for evaluating the performance of the developed IDS out of
which, NSL-KDD dataset is well used by researchers. As it became old, in 2015, new
dataset named UNSW NB 15 dataset was developed. No research was found using this
dataset for performing intrusion detection. As it has more number of attack classes and
huge compared to NSL-KDD, we used this dataset also to train and test performance of
the proposed approach. More information about the datasets is discussed below.
attributes namely the attribute name, their description and sample data is given in [21].
The features in NSL-KDD dataset are of different data types. The features of the dataset
can also be grouped into three different categories. They are basic features, traffic
features and content features. Apart from normal data, records for 39 different attack
types exist in NSL-KDD dataset. All these attack types were grouped into four attack
classes. The summary of the attack classes and their attack types and a detailed
information is available in [21].
Figure 1 gives an overview on the NSL-KDD datasets used for training and testing
the developed IDS. This gives the number of data elements in the entire dataset.
• The training and testing set have different distribution on attack types. For instance,
the existing benchmark datasets have different data types comparing between the
training and testing set.
The UNSW-NB 15 dataset includes 49 features in total and it has nine attack
classes. The attack classes and the attack categories are defined in [24].
Figure 2 gives an overview on the UNSW-NW 15 dataset used for training and
testing the developed IDS. This gives the number of data elements in the entire dataset.
The features of numeric type include both integer and float variables. Min-Max
normalization approach is used to normalize the continuous data into the range of
[0, 1]. The mathematical formula for Min-Max normalization is represented in Eq. 1.
x xmin
xnew ¼ ð1Þ
xmax xmin
Where x stands for individual numeric feature value, xmin stands for the minimal
value of the feature, xmax stands for the maximum value, xnew stands for the pre-
processed value after normalization. After normalization, the individual value is dis-
cretized into 10 intervals with individual range increasing with 0.1. All the discretized
values are again converted to binary format using one-hot encoder schema. The binary
features in dataset are taken as it is. After pre-processing, each NSL-KDD network
packet turns into a binary vector with 464 dimensions.
These extended binary vectors are then transferred into an 8 8 grayscale image in
the image representation stage. Each 8 bits from the binary vector was taken indi-
vidually and translated into a grayscale pixel. From binary vector, we get 58 grayscale
image pixels. To make it 8 8 image, rest of the pixels are padded with 0’s. The
grayscale image of individual categories is represented in the following Fig. 4.
The generated images from image representation stage of our framework is visible
in Fig. 4. The images represented here are just a sample from the entire set of generated
images. A deep insight into the image can show slight difference between the normal
232 S. Potluri et al.
and different attack types. These images are fed to the CNN for training as well as for
testing the performance of trained IDS.
Combining the above-mentioned key parameters forms the CNN. The convolution
and pooling layers act as a feature extraction mechanism out of an image while the fully
connected layer act as a classifier. More detailed discussion on CNN is discussed in
[29]. Figure 5 will give a detailed overview of the above-mentioned concepts in
relation to our application of CNN for IDS. The detailed functionality of the imple-
mented CNN model is discussed in the next section.
5.1 Implementation
Proposed architecture was implemented in MATLAB 2017b using deep learning
libraries provided by MathWorks [30]. The deep learning libraries provided by
MathWorks has improved a lot when compared to its previous versions and provides a
lot of configuration parameters and flexibility which make the deep learning algorithms
tunable to many individual applications and needs. MATLAB also provides the option
of training the deep learning algorithms on CPU as well as on GPU. MATLAB also
provides the real-time UDP communication which in future can be used for developing
real -time deep learning based IDS [31].
From Fig. 5, we can see the MATLAB implemented CNN model. It includes the
following steps:
Step 1: From pre-processing and image representation stage we generated the image
dataset with each image of size 8 8. This is given as an input image to CNN –
Hidden Layer 1.
Step 2: The CNN layers are initially initialized with random weights and filters and
these are adopted during the training process.
Step 3: The network takes the input image and initiates the training process. The
image goes through the forward propagation steps (convolution, ReLu and pooling
operations along with forward propagation of the fully connected layers) and finds
the output probabilities.
Step 4: The error value of the desired output to the generated output is calculated.
And validation is performed after every 300 iterations.
234 S. Potluri et al.
Step5: Now backpropagation with gradient decent is used to update the network
weights and all filter values to minimize the output error.
The above steps are continued until the validation function measures the same
value for five times as the patience was set to 5. This ensure the network from over-
fitting. Narrow convolution technique is used in the first convolution hidden layer. The
output feature map of the first convolution hidden layer is smaller than 8 8. Due to
this reason, in second and third hidden layers wide convolution techniques is used by
padding the feature maps with zeros. Softmax regression with non-linear sigmoid
transfer function is used for classification of attack classes at the final fully connected
layer. The output of the trained CNN is multiple attacks classes present in the dataset.
The above steps train the CNN by optimizing the weights and filters to correctly
classify the input trained images for attack classes. Now, the new test images are given
as an input to the trained CNN. Now the CNN only perform the forward propagation
and output the probability for each class (the output probabilities are calculates using
the weights that are optimized during the training process). Based on the outputs, the
CNN based IDS is finetuned for best possible configuration by modifying the con-
figuration parameters such as the performance metrics are evaluated.
Fig. 6. CNN based detection accuracies for multiple attack classes NSL-KDD dataset
Table 2. CNN based detection accuracies for multiple attack classes UNSW-NB 15 dataset
Attack classes Overall
Normal Generic Exploit Fuzzers DoS Reconnaissance Analysis Backdoor Shellcode Worms detection
rate (%)
99.70% 97.70% 61.80% 6.8% 0 0 0 0 0 0 94.9%
Finally, we also compared the performance of our approach with other existing
CNN based approaches who used NSL-KDD for training and testing. As they used the
overall accuracy, we also mentioned our approach in the same manner. Following
Table 3 provides the overall detection accuracies of the existing approach to our
approach and the results looks promising. The performance of other deep learning
techniques on NSL-KDD dataset was evaluated in [14]. From our results, CNN looks
promising.
Table 3. Performance comparison of our approach with existing CNN based approaches
Technique Test accuracy
CNN – ResNet 50 79.14%
CNN – GoogLeNet 77.04%
CNN – Proposed approach 91.14%
236 S. Potluri et al.
This research focuses on the CNN based intrusion detection using NSL-KDD and
UNSW-NB 15 dataset. The network packets of the dataset are initially pre-processed
and later converted them into images.
CNN architecture is developed to train and test the performance of developed IDS.
Multi-class attack classification is performed and this is unique to the existing
approaches and due to proper training, better detection accuracies were achieved when
compared to the existing CNN based approaches. For multiclass attack classification,
CNN didn’t outperform other deep learning-based IDS such as SAE and DBN in attack
classifications [16]. But the classification of normal class has reached almost 99%
accuracy which was unable to achieve by other deep learning approaches. This indi-
cates that proper training dataset will make the CNN a better classification algorithm
for intrusion detection. Transfer of network packets to image format makes use of CNN
and this avoid the process of feature selection and is a clear advantage w.r.t to existing
deep learning techniques. But it is obvious that every deep learning algorithm needs to
be evaluated for the individual case to evaluate the performance. Finally, it is clear that
along with other deep learning approaches such as SAE and DBN, CNN is also a good
approach in developing IDS in the ICS applications.
As a future work, a better image conversion and image representation techniques
needs to be identified. In future, we will test our algorithms with this other new dataset.
To counter fight with imbalanced datasets and train network efficiently for all attack
types, Generative Adversarial Networks (cGAN). We also consider simulating our own
dataset in the context of ICS for more precise application specific development.
As MATLAB supports, real time network traffic acquisition, implementation of deep
learning algorithms on the real time network traffic data will be done.
References
1. Stallings, W.: Network security essentials : applications and standards (2000)
2. Scarfone, K., Mell, P.: Guide to intrusion detection and prevention systems (IDPS)
recommendations of the National Institute of Standards and Technology. NIST Spec. Publ.
800–94, 127 (2007)
3. Tofino Security, SCADA Security Basics: Why Industrial Networks are Different than IT
Networks (2012). https://www.tofinosecurity.com/blog/scada-security-basics-why-industrial-
networks-are-different-it-networks
4. Colbert, E.J.M., Kott, A. (eds.): Cyber-security of SCADA and Other Industrial Control
Systems. AIS, vol. 66. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32125-7
5. M. Lincoln Laboratory, DARPA Intrusion Detection Data Sets. https://www.ll.mit.edu/
ideval/data/. Accessed 07 Apr 2016
6. Sahu, S.K., Sarangi, S., Jena, S.K.: A detail analysis on intrusion detection datasets. In:
Souvenir 2014 IEEE International Advance Computing Conference (IACC 2014), May,
pp. 1348–1353 (2014)
7. McHugh, J.: Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA
intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Trans.
Inf. Syst. Secur. 3(4), 262–294 (2000)
Convolutional Neural Networks for Multi-class IDS 237
27. Wu, H., Gu, X.: Max-pooling dropout for regularization of convolutional neural networks.
In: Arik, S., Huang, T., Lai, W.K., Liu, Q. (eds.) ICONIP 2015. LNCS, vol. 9489, pp. 46–
54. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26532-2_6
28. Karn, U.: An intuitive explanation of convolutional neural networks. The Data Science Blog
(2016). https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/. Accessed 06 May
2018
29. Bhandare, A., Bhide, M., Gokhale, P., Chandavarkar, R.: Applications of convolutional
neural networks. Int. J. Comput. Sci. Inf. Technol. 7(5), 2206–2215 (2016)
30. MathWorks, Deep Learning Basics, Documentation (2018). https://www.mathworks.com/
help/nnet/deep-learning-basics.html. Accessed 06 May 2018
31. MathWorks, Real-Time UDP (2018). https://www.mathworks.com/help/xpc/real-time-udp.
html. Accessed 06 May 2018