0% found this document useful (0 votes)
4 views15 pages

DeepLearningApproachCombainingsparseautoencoderwithSVMForNetworkintrusiondetection

The document presents a novel deep learning approach for network intrusion detection, combining a sparse autoencoder with support vector machines (SVM) to enhance classification accuracy and reduce training times. The proposed self-taught learning-based intrusion detection system (STL-IDS) demonstrates improved performance over traditional methods, particularly in binary and multiclass classification using the NSL-KDD dataset. Experimental results indicate that STL-IDS outperforms existing techniques, offering a new research direction for effective network security solutions.

Uploaded by

caio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views15 pages

DeepLearningApproachCombainingsparseautoencoderwithSVMForNetworkintrusiondetection

The document presents a novel deep learning approach for network intrusion detection, combining a sparse autoencoder with support vector machines (SVM) to enhance classification accuracy and reduce training times. The proposed self-taught learning-based intrusion detection system (STL-IDS) demonstrates improved performance over traditional methods, particularly in binary and multiclass classification using the NSL-KDD dataset. Experimental results indicate that STL-IDS outperforms existing techniques, offering a new research direction for effective network security solutions.

Uploaded by

caio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/327618271

Deep Learning Approach Combining Sparse Autoencoder With SVM for


Network Intrusion Detection

Article in IEEE Access · September 2018


DOI: 10.1109/ACCESS.2018.2869577

CITATIONS READS

537 2,283

4 authors:

Majjed Alqatf Lasheng Yu


University of Science and Technology of China Central South University
16 PUBLICATIONS 613 CITATIONS 45 PUBLICATIONS 881 CITATIONS

SEE PROFILE SEE PROFILE

Mohammed Alhabib Kamal Al-Sabahi


Central South University CVTE
4 PUBLICATIONS 555 CITATIONS 19 PUBLICATIONS 818 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Majjed Alqatf on 21 October 2018.

The user has requested enhancement of the downloaded file.


Received August 13, 2018, accepted September 5, 2018, date of publication September 13, 2018, date of current version October 12, 2018.
Digital Object Identifier 10.1109/ACCESS.2018.2869577

Deep Learning Approach Combining Sparse


Autoencoder With SVM for Network
Intrusion Detection
MAJJED AL-QATF , YU LASHENG, MOHAMMED AL-HABIB, AND KAMAL AL-SABAHI
School of Information Science and Engineering, Central South University, Changsha 410083, China
Corresponding author: Yu Lasheng ([email protected])
This work was supported by the National Nature Science Foundation of China under Grant Z201610110620003.

ABSTRACT Network intrusion detection systems (NIDSs) provide a better solution to network security
than other traditional network defense technologies, such as firewall systems. The success of NIDS is highly
dependent on the performance of the algorithms and improvement methods used to increase the classification
accuracy and decrease the training and testing times of the algorithms. We propose an effective deep learning
approach, self-taught learning (STL)-IDS, based on the STL framework. The proposed approach is used
for feature learning and dimensionality reduction. It reduces training and testing time considerably and
effectively improves the prediction accuracy of support vector machines (SVM) with regard to attacks. The
proposed model is built using the sparse autoencoder mechanism, which is an effective learning algorithm for
reconstructing a new feature representation in an unsupervised manner. After the pre-training stage, the new
features are fed into the SVM algorithm to improve its detection capability for intrusion and classification
accuracy. Moreover, the efficiency of the approach in binary and multiclass classification is studied and
compared with that of shallow classification methods, such as J48, naive Bayesian, random forest, and SVM.
Results show that our approach has accelerated SVM training and testing times and performed better than
most of the previous approaches in terms of performance metrics in binary and multiclass classification. The
proposed STL-IDS approach improves network intrusion detection and provides a new research method for
intrusion detection.

INDEX TERMS Network security, network intrusion detection system, deep learning, sparse autoencoder,
SVM, self-taught learning, NSL-KDD.

I. INTRODUCTION administrators detect attacks, vulnerabilities, and breaches


Approximately 50 billion devices are expected to be con- inside an organization’s network. The two forms of NIDS are
nected to the Internet by 2020 due to the wide range of signature-based NIDS (SNIDS) and anomaly detection-based
communication and network technologies that have changed NIDS (ADNIDS). In SNIDS, the system detects attacks on
our daily lives. These technologies have been used world- the basis of rules that are pre-installed for attacks in NIDS.
wide in nearly all organizational operations, such as online Network traffic is compared with an updated database of
shopping, banking, industrial applications, and email sys- attack signatures to detect intrusion in the network traffic
tems. Although the benefits provided by these technologies dataset.
have improved our lives and changed the world, informa- In ADNIDS, the system classifies unknown or unusual
tion security remains a crucial issue. Organizations need to behavior in network traffic by studying the structures of nor-
provide secure communication channels to Internet users, mal behavior in network traffic. Network traffic that deviates
including the organizations’ customers and employees, and from a normal traffic pattern is classified as an intrusion.
detect unauthorized activities. Currently, network intrusion The advantage of ADNIDS is that unknown/new attacks can
detection systems (NIDS) offer a better solution to the secu- be predicted. Therefore, we focus on this type of intrusion
rity problem compared with other traditional network defense detection systems. Anomaly detection methods can be used
technologies, such as firewall systems. NIDS helps network in various areas, such as network security, fraud detection in

2169-3536 2018 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 6, 2018 Personal use is also permitted, but republication/redistribution requires IEEE permission. 52843
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
M. Al-Qatf et al.: Deep Learning Approach Combining SAE With SVM

credit cards, military applications, and many medical applica- strengths. Better or at least competitive results
tions [1]. The following can be performed to develop an effec- are achieved compared with the results of similar
tive anomaly-based intrusion detection system. First, proper approaches. Moreover, our approach considerably
feature selection based on feature extraction and dimensional- reduces training and testing times of SVM.
ity reduction must be implemented when extracting a subset (3) We use the NSL-KDD dataset to compare the efficiency
of a correlated features from the network traffic dataset to of our approach with single SVM and that of differ-
enhance classification results [2]. Second, the best techniques ent classification algorithms, such as naïve Bayesian,
must be used to help enhance the classification results and random forest, multi-layer perceptron, and many other
increase the classification speed. classification algorithms in related work on binary and
Various supervised and unsupervised machine learning multiclass classification.
techniques can be used or integrated with other algorithms Experimental results show that our approach is suitable
in ADNIDS to enhance intrusion detection performance and for intrusion detection. Its performance is superior to that
increase the classification rate of machine learning algo- of traditional classification algorithms using the NSL-KDD
rithms, such as decision tree, random forest, self-organization dataset and most previous approaches in terms of binary and
maps (SOM), and support vector machine (SVM), which multiclass classification.
have been utilized to detect and classify intrusions. Many The rest of this paper is structured as follows. We briefly
researchers that investigated NIDS focused on using unsu- describe related studies, particularly those that examined
pervised learning techniques followed by shallow machine unsupervised machine learning techniques and SVM-based
learning, such as SVM, random forest, and naïve Bayesian deep learning approaches, in Section II. In Section III, we pro-
[3], [4], because using unsupervised learning techniques vide an overview of our proposed methodology for NIDS
before shallow machine learning offers improvements in implementation, SVM, the NSL-KDD dataset, data process-
detection rate. Unsupervised learning algorithms and dimen- ing, and evaluation metrics. In Section IV, the performance of
sion reduction methods are frequently used in feature extrac- our approach is evaluated based on the experimental results
tion and feature representation to improve data quality [5] and and compared with that of related approaches for NIDS. Our
achieve an improvement in the classification results of shal- conclusions and directions for future research are presented
low and traditional supervised machine learning algorithms. in Section V.
Recently, remarkable achievements in unsupervised deep
learning-based methods were successfully applied in vision II. RELATED WORK
computing applications. In addition, deep learning techniques Network intrusion detection has become the most important
[6]–[10] can be adopted as unsupervised feature learning part of the infrastructure of defense networking systems
methods that help supervised machine learning improve its in information security. Various machine learning algo-
performance and identification of network traffic anomalies rithms or approaches are applied in NIDS to detect and
by reducing the testing and training times. distinguish between normal traffic and anomalies or attacks
Deep learning approaches have a good potential to in network traffic; these approaches include decision
achieve effective data representation for building improved tree [11], k-nearest neighbor (K-NN) [12], naïve Bayes net-
approaches. Therefore, on the basis of a self-taught learning work [13], [14], SOM [15], [16], SVM, and artificial neural
framework and inspired by the combination of the sparse network (ANN) [17]. SVM demonstrates better performance
autoencoder (SAE) with SVM, we propose using self-taught than other traditional machine learning classification tech-
learning (STL) for good data representation and SVM for the niques [18].
classification task. STL is a deep learning approach that is A work proposed by Mukkamala et al. [19] compared the
based on the SAE algorithm. It helps rebuild input represen- performance of SVM and ANN on the KDD CUP 99 dataset.
tation and converts it to feature representation of data related The results showed that the detection results of SVM are
to the input data, thereby improving the performance of the better than those of ANN. In [20], SVM, naïve Bayes, logistic
classification task considerably. The main contributions of regression, decision tree (DT), and classification and regres-
this work are as follows: sion tree (CART) approaches were compared in terms of
(1) We develop a novel deep learning approach STL- intrusion detection classification by using the KDD CUP
IDS (a self-taught learning based intrusion detection 99 dataset. The results showed that SVM has distinct features.
system) based on the STL framework by combin- Ashfaq et al. [21] developed a new method by using the
ing SAE and SVM for network intrusion detection. fuzziness approach based on semi-supervised learning for
We study the potential of our approach to achieve effec- intrusion detection. This method uses a neural network with
tive representation and dimensionality reduction for the random weights and plays an important role in the detec-
improvement of the classification results of shallow and tion rate of NIDS because it decreases the computational
traditional supervised machine learning algorithms, cost. The model was evaluated on the NSL-KDD dataset
such as SVM, in binary and multiclass classification. but the performance of the model was studied on only the
(2) We combine deep and shallow learning techniques binary classification task. In [22], a deep learning model
in our novel approach and exploit their respective based on a recurrent neural network with a soft-max classifier

52844 VOLUME 6, 2018


M. Al-Qatf et al.: Deep Learning Approach Combining SAE With SVM

was presented. The model was evaluated on the NSL-KDD and Gaussian naïve Bayes. The experimental results showed
dataset, and the performance of the model in binary and that the model is an improvement of traditional machine
multiclass classification was studied. The model showed a learning. However, the model is computationally expensive
deep learning capability to model high-dimensional features, because it comprises many hidden layers and entails two
and an improved accuracy rate of intrusion detection was training stages. Diro and Chilamkurti [25] presented IOT/fog
achieved. However, the training time was large. SVM is one network attack detection system. Their system was based on
of the most important traditional machine learning algorithms distributed deep learning. The performance of their model
that depend on statistical learning theory, and it uses struc- was compared against shallow and traditional machine learn-
tural risk minimization to achieve a strong generalization ing approaches. They used deep learning model with three
capability. In addition, SVM presents a constrained quadratic hidden layers for feature learning and soft-max regression
programming problem that requires a large memory and con- (SMR) for the classification task. However, their model is
siderable training time. In addition, the training complexity computationally expensive compared to our approach. Their
of SVM is highly dependent on the size of the dataset. There- model was evaluated on NSL-KDD dataset in both binary
fore, the performance of SVM based on IDS needs to be and multiclass classification. Their method has demonstrated
enhanced, and the training and testing times must be reduced. that the distributed attack detection can better detect attacks
To address these limitations, many studies have improved than centralized detection system because of the parameter
SVM-based IDS by combining SVM with other methods. sharing that can avoid local minima in training. Our approach
In [23], efficient machine learning based on SVM with fea- has a significant difference since we aimed at adopting a
ture augmentation was presented to increase the quality of the new approach to enable the detection of attacks through
SVM classifier. The method improved the intrusion detection centralized detection system. By contrast, their model aimed
rate of SVM and reduced the required training time. The at adopting new approach to enable the detection of the
weakness of this approach is that its detection accuracy is attacks through distributed detection system in social internet
insufficient, and the time factor are not considered. of things.
Many researchers have combined supervised and unsuper- Wang et al. [26] proposed novel intrusion detection
vised learning algorithms to create a model that can increase system called hierarchical spatial-temporal feature-based
the detection rate of supervised machine learning classifiers, intrusion detection system (HAST-IDS); their system used
such as an SVM and random forest. In [24], many unsuper- the deep convolutional neural network for learning the
vised learning algorithms were combined with SVM and a low-level spatial features of network traffic, LSTM networks
neural network (NN) to improve the performance of the intru- (long short-term memory networks) for learning high-level
sion detection system. The authors designed, implemented, temporal features. They used the standard DRAPA and
and evaluated many hybrid models that use principal com- ISCX2012 dataset to evaluate the performance of their pro-
ponent analysis (PCA) or Gradual Feature Reduction (GFR) posed system. Their model is computationally expensive
for feature selection and SVM or NN for classification. The compared to our approach because they used two stages for
results showed that hybrid models can effectively detect feature learning. Farahnakian and Heikkonen [27] proposed
known and unknown attacks, and PCA and GFR feature deep learning approach for intrusion detection. The model
selection techniques are computationally expensive in terms was built using deep autoencoder and trained in a greedy layer
of training and testing times. Unsupervised learning based wise fashion in order to avoid overfitting. The performance of
on deep learning has been used recently in feature extrac- their approach was evaluated on KDDCup 99 (old version of
tion and dimensionality reduction, leading to an increase NSL-KDD); their approach performance was studied on both
in the detection rate and a decrease in the processing time binary and multiclass classification. Although high accuracy
of supervised machine learning algorithms, such as SVM was achieved for intrusion detection task, their approach is
and soft-max. Alom et al. [8] proposed a deep learning computationally expensive compared to our approach.
approach based on stack restricted Boltzmann machine for Madani and Vlajic [28] have studied the viability of using
feature extraction and dimensionality reduction and based on the deep autoencoder in anomalies detection in adaptive
SVM for the classification of network intrusion detection. intrusion detection system under adversarial contamination.
The approach was implemented on merely 40% of the NSL- They used the reconstruction error of the autoencoder as a
KDD training dataset. The approach performed better than measure for anomaly detection and the NSL-KDD dataset
single SVM or single deep belief networks (DBN) and many for performance evaluation. Our approach is significantly
other approaches. In [10], a feature learning model based on different since we used the autoencoder for feature learning
AE was presented to achieve a good representation of differ- and dimensionality reduction. Moreover, we used the perfor-
ent feature sets. This feature learning model was applied to mance metrics, training time, and testing time to evaluate the
malware classification and anomaly-based network intrusion performance of our model for anomaly detection.
detection by using the NSL-KDD dataset. The topology of In [2], NIDS based on unsupervised deep learning tech-
the used AE was different from the common topology, and niques was developed using SAE for feature learning and
the extracted feature by the AE was applied to many tra- soft-max regression (SMR) for classification. Evaluation
ditional machine learning algorithms, such as SVM, K-NN, was based on all performance metrics on the NSL-KDD

VOLUME 6, 2018 52845


M. Al-Qatf et al.: Deep Learning Approach Combining SAE With SVM

dataset using the two evaluation approaches in Section IV:


KDDTrain+ using 10-fold cross-validations for two-category
(normal traffic and attacks) and five-category (normal and
five types of attacks) classification. Furthermore, evaluations
were performed separately for training and testing based on
KDDTrain+ and KDDTest+. The evaluation results showed
that the approach shows better performance in terms of
accuracy rate for two-category classification compared with
SMR and many other approaches. The weakness of this
approach is that the dimensionality reduction mechanism,
which significantly reduces the training and testing times for
intrusion detection, is not considered. In addition, both eval-
uation approaches obtained low accuracy in the five-category
classification. Although many unsupervised learning-based
network intrusion detection methods have been presented in
recent years, several of them still suffer from limitations and
issues, such as the following:
• Shallow learning is inappropriate for intelligent analysis,
and the predicting requirements of high-dimensional
learning have redundant features. Hence, a raw dataset
leads to reduced classifier accuracy. By contrast, deep
learners can achieve an effective representation, thus
improving the classification results of shallow and tradi- FIGURE 1. Block diagram of the proposed STL-IDS.
tional supervised machine learning algorithms, such as
SVM and RF.
(1) (2) (m)
• Although several approaches that depend on deep learn- Figure 1-STEP2, we feed xl , xl , . . . , xl (KDDTrain+
ing techniques are effective, they continue to suffer from and KDDTest+ dataset) as an input to a sparse autoen-
time complexity. coder which attempts to reconstruct and learn its output
On the basis of this analysis, we propose a combination of values (bxl (1) , xbl (2) , xbl (3) , . . . ., xbl (m) ) to be equal to its inputs
(1) (2) (m)
SAE and SVM based on STL. First, we use SAE for effective (xl ,nxl , . . . ., xl ), getting a new and  good representa-
o
(1) (1) (2) (2)
, . . . . . . , hl , y(m)
representation of our raw dataset (NSL-KDD), followed by (m)
tion hl , y , hl , y where
the use of SVM for classification. The accuracy rate of SVM the original input data is replaced with corresponding vec-
and training and testing times are optimized simultaneously. tor of activations h as
n  in  Figure 2. Thus, our  trainingoset
(1) (1) (2) (2)
, . . . . . . , hl , y(m) ,.
(m)
becomes hl , y , hl , y
III. PROPOSED METHODOLOGY: SAE–SVM AND
NSL-KDD DATASET OVERVIEW Finally, we train SVM using the new training set to obtain
A. PROBLEM FORMULATION a function that performs predictions of the intrusion on the
y values. For the given testing set xtest , we follow the same
In our self-taught learning approach (SAE–SVM ),
scenario for the training set: feeding it to sparse autoencoder
we are given
  a labeled training setoof m records
n
(1) (2)
 
(m) to get htest . Then, we feed htest to the trained SVM classifier
xl , y(1) , xl , y(2) , . . . , xl , y(m) , where input
to get a prediction. Our goal is to improved SVM classi-
(i)
feature vector xl ∈ Rn (The subscript ‘‘l’’ indi- fication accuracy and accelerating the training and testing
cates that it is a labeled record), y(i) ∈ {+1, −1} are and to develop a network intrusion detection model that
the corresponding labels for binary classification, y(i) ∈ can accurately and quickly predict the intrusions in both
{1, 2, . . . ., C} are corresponding labels for multiclass clas- binary and multiclass classification on NSL-KDD dataset and
sification. Additionally, we assume there are m unlabeled the pre-learned sparse autoencoder with SVM. The detailed
(1) (2) (m)
samples xu , xu , . . . ., xu ∈ Rn produced by remov- steps of the proposed approach will be presented in the
ing the labels from the labeled training set. For a better next subsections. Our STL model based on SAE and SVM
representation and less dimensionality of the input train- involves many steps (Figure 1). The basic methodology is as
(1) (2) (m)
ing set xl , xl , . . . , xl ∈ Rn , as in Figure 1-STEP1, follows.
(1) (2) (m)
we feed the unlabeled sample xu , xu , . . . ., xu ∈ Rn
(KDDTrain+) to the sparse autoencoder algorithm. It can B. STL: SAE–SVM
be used to reconstruct and learn the input training dataset STL [29] is a new deep learning framework that involves
(1) (2) (m)
xl , xl , . . . , xl ∈ Rn . After learning the optimal val- two stages. In the first stage, new effective representation
ues for W and b1 (Trained parameter set in Figure 1) by is obtained from our NSL-KDD dataset without label, xu ,
applying SAE on unlabeled data xu (KDDTrain+), as in and is called unsupervised feature learning (UFL). The new

52846 VOLUME 6, 2018


M. Al-Qatf et al.: Deep Learning Approach Combining SAE With SVM

representation is related to unlabeled data. In the second and bias vectors b1 ∈ RK ×1 and b2 ∈ RN ×1 , which attempt
stage, the new representation is combined with labeled data, to learn and reconstruct its output values b xi to be equal to its
xl , then any supervised algorithm, such as SVM, can be used inputs xi . In other words, an approximation to the identity
for the classification task. function is learned to make the output values similar to the
Different methods can used for UFL [30]; for our model, input values; that is, it uses y(i) = x (i) [3], [31]. The activation
we adopt SAE which is an unsupervised learning algorithm function is chosen to be the sigmoid function, g (z) = 1+e1 −z ,
that consists only of a single hidden layer. It can be used and its output range is [0,1]. It is used for the activation
for feature learning and dimensionality reduction instead of (hW , b) of the nodes in the hidden and output layers are
PCA to achieve a significantly nonlinear generalization. Its shown in (1a).
input and output layers have the same number of units. The
m
input and output layers contain N units, and the hidden layer 1 X
T = xi k2
k xi − b
contains K units. As shown in Figure 2(a), the output values 2m
i=1
xi in the output layer is similar to the input values xi in the
b  
input layer. λ X X X X
+  W2 + V2 + b21 + Wb2 2 
2 n
k,n n,k k
k
X
+β KL(ρ k pbj ) (2)
j=1

SAE also applies backpropagation to minimize the cost func-


tion, which is represented by Eq.2 [2]. The first term is
the average sum-of-square errors for all m input data. The
second term is a weight decay parameter (λ) used for tuning
the weights between the hidden and output units to improve
performance and prediction while helping check and avoid
overfitting. The last term in the equation is the sparsity
penalty term that places a constraint on the hidden layer to
maintain low average activation values; it is expressed as
Kullback–Leibler (KL) divergence shown in Eq. 3 [31].
ρ 1−ρ
KL ρ k pbj = ρ log + (1 − ρ) log ,

(3)
FIGURE 2. Self-taught learning stages.
ρ
bj 1 − pbj

In the recent years, there is increasing attention to the where ρ is a sparsity constraint parameter that ranges from
study of single-layer SAE as a feature learning and dimen- 0 to 1 and β controls the sparsity penalty term. KL ρ k pbj
sionality reduction method. The SAE can learn effective attains a minimum value when ρ = pbj , where denotes pbj
low-dimensional features from the raw data and make it easier the average activation value of hidden unit j over all training
to extract efficient and appropriate low-dimensional features inputs x. After learning the optimal values for W and b1 by
automatically for the classification process. applying SAE on unlabeled data xu , we evaluate the feature
Feature extraction and dimensionality reduction process representation a = h for labeled data (xl , y). We use this new
in SAE involves two steps: encoding and decoding. The feature representation, h, with the label vector, y, in SVM for
encoding step maps the input data xi into the hidden units’ the classification task in the second stage of STL, as shown
representations, as shown in (1a): in Figure 2(b). Figure 2 shows an architectural diagram of
the proposed STL. We apply STL based on SAE for good
h = f (X ) = g(WX + b1 ) (1a)
data representation because of its simple and straightforward
The encoding step maps the hidden units’ representations into implementation and its capability to learn the original expres-
the reconstructed data, as shown in (1b): sions and structures of data. The wide application of STL
extends particularly to image identification [32], [33], SVM
Z = g (Vh + b2 ) (1b)
for classification tasks, and distinguishing different types
In the above equations, X = (x1 , x2 , x3 , x4 , . . . ., xi ) is the of intrusions because combining robust classifiers, such as
high-dimensional input data vector,Z = (b x1 , xb2 , x\
3,...... , xb
m) SVM, with SAE leads to enhanced performance in intrusion
is the reconstruction vector of the input data and h = detection. Furthermore, the features extracted from the SAE
(h1 , h1 , h1 , . . . ., hk ) is the low-dimensional vector output algorithm are passed to the SVM classifier for intrusion
from the hidden layer. detection. The performance accuracy rate of our method is
SAE applies backpropagation algorithm to obtain the opti- better than that of SVM alone, and the training and testing
mal values for its weight matrices W ∈ RK ×N and V ∈ RN ×K times of SVM are reduced.

VOLUME 6, 2018 52847


M. Al-Qatf et al.: Deep Learning Approach Combining SAE With SVM

C. SVM TABLE 1. Attack types and categories.

The SVM classifier relies on statistical learning theory (SLT)


and produces a hyperplane to isolate a class of positive
instances from a class of negative instances by using struc-
tural risk minimization rules. SVM aims to split data points
with a hyperplane and determine which class each data point
belongs to. SVM maximizes the margin between support
vectors because separating all classes is necessary [34]. SVM
is a popular learning technique due to its high classifica-
tion accuracy and performance in solving regression and
classification tasks. SVM was initially designed for binary
classification. Later, it was extended to multi-class scenarios.
The many basic functions of SVM include linear, polynomial,
sigmoid, and RBF kernels. We use the RBF kernel, which is
also known as the Gaussian kernel, in our research. RBF has
two parameters: C and σ . Both can be artificially adjusted, into four types: user-to-remote attacks (U2R), denial of ser-
and different parameter values correspond to the nature of vice attacks (DOS), root-to-local attacks (R2L), and probing
classifiers. The performance of SVM depends on selecting attacks (Probe). These types and categories are summarized
suitable kernel function types and proper parameters of the in Table 2. Several attacks exist in the testing set (KDDTest+)
kernel function for our problem. In our proposed approach, but not in the training set (KDDTrain+). The difference
we use the automatic parameter selection method by applying between training and testing sets provides a highly realistic
k-fold cross-validation (CV) to search for the best parameter theoretical basis for intrusion detection.
of the RBF kernel. Many strategies, such as one versus the
rest (OvsR) and one versus one (OvsO), can be used to
E. DATA PREPROCESSING
build a multi-class SVM classifier. We use the OvsR strategy
1) 1-TO-N NUMERICAL ENCODING
in our method. Given that SVM consumes much time for
training, numerous approaches are implemented in SVM to The SAE–SVM algorithm used in our approach cannot
reduce the required processing time for classification and directly process the NSL-KDD dataset in its original for-
prediction tasks. The storage requirements and computational mat. However, we use a 1-n encoding system to convert
complexity of the SVM with RBF kernel depend on both non-numeric features into numeric features before applying
input dimensionality (d) and the number of support vectors STL, as shown in Figure 1. The NSL-KDD dataset has
(nSV). Generally, the storage requirements and computa- three non-numeric features and 38 numeric features. Hence,
tional complexity is bound by O(d nSV). we apply a 1-n encoding system to the non-numeric features,
such as ‘‘protocol-type’’ ‘‘service,’’ and ‘‘flag,’’ as follows:
(1) We convert the ‘‘protocol-type’’ feature into a numeric
D. NSL-KDD DATASET OVERVIEW feature. The ‘‘protocol-type’’ feature has three distinct
The NSL-KDD dataset was recommended in 2009 by Traval- attributes, namely, tcp, udp, and icmp, and these can
laee et al. [35] because of the inherent drawbacks of KDD be encoded as (1,0,0), (0,1,0), (0,0,1) in binary vectors,
CUP99 [35]. Subsequently, many researchers in intrusion respectively.
detection used (2) We convert the ‘‘service’’ and ‘‘flag’’ features into
NSL-KDD to evaluate their approaches, similar to what numeric features. The ‘‘service’’ feature has 70 distinct
was done in [21] and [36]. NSL-KDD was built based attributes, and the ‘‘flag’’ feature has 11 distinct attributes.
on the KDD CUP 99 dataset, but the redundant instances By using the same method in the first step, each distinct
were removed and the structure of the dataset was recon- attribute of ‘‘service’’ is mapped into 70-dimensional binary
stituted [35]. The NSL-KDD dataset is normally used to attributes, and each distinct attribute of ‘‘flag’’ is mapped into
evaluate the effectiveness of proposed approaches for intru- 11-dimensional binary attributes. After all the transforma-
sion detection, especially anomaly-based network intrusion tions, the 41-dimensional features of the NSL-KDD dataset
detection. NSL-KDD has a reasonable number of records are mapped into 122-dimensional features.
in training and testing sets. The total number of records in
the training set (KDDTrain+) is 127,973, and the testing 2) NORMALIZATION
set (KDDTest+) has 22,544 records. Each traffic record in Several of the features of the NSL-KDD dataset have very
the NSL-KDD dataset contains 41 features (6 symbolic and large ranges between the maximum and minimum values,
35 continuous), as shown in Table 1, and 1 class label. such as the difference between the maximum and minimum
The features can be categorized into three types: basic, con- values in ‘‘duration [0, 58329],’’ where the maximum value
tent, and traffic (Table 1). According to feature characteris- is 58,329 and the minimum is 0. A large difference also exists
tics, the attacks in the NSL-KDD dataset can be classified in other feature values, such as ‘‘src-bytes’’ and ‘‘dst-bytes,’’

52848 VOLUME 6, 2018


M. Al-Qatf et al.: Deep Learning Approach Combining SAE With SVM

TABLE 2. Feature details of the NSL-KDD dataset. proposed approach. The attribute values that resulted from
the training and testing processes of the NSL-KDD dataset
are used to calculate these performance metrics. The values
can be defined as follows:
• True positive (TP): anomaly instances correctly classi-
fied as an anomaly.
• False positive (FP): normal instances wrongly classified
as an anomaly.
• True negative (TN): normal instances correctly classi-
fied as normal.
• False negative (FN): anomaly instances wrongly classi-
fied as normal.
Then, we compute the performance metrics from the follow-
ing notations.
• Accuracy (AC): indicates the proportion of correct clas-
sifications of the total records in the testing set, as shown
in (5).
TP + TN
AC = (5)
TP + TN + FP + FN
• Precision (P): indicates the proportion of correct pre-
dictions of intrusions divided by the total of predicted
intrusions in the testing process, as shown in (6).
TP
p= (6)
TP + FP
• Recall (R): indicates the proportion of correct predic-
tions of intrusions divided by the total of actual intrusion
instances in the testing set, as shown in (7).
TP
R= (7)
TP + FN
• F-measure (F): is considered the most important metric
of network intrusion detection that represents both pre-
cision (P) and recall (R), as shown in (8).
2∗P∗R
F= (8)
P+R
IV. PERFORMANCE EVALUATION: IMPACT OF THE
thereby making the feature values incomparable and unsuit- LOW-DIMENSIONAL FEATURES AND DIFFERENT HIDDEN
able for processing. Hence, these features are normalized by UNITS AND SPARSITY PARAMETER ON SVM CLASSIFIER
using max–min normalization for mapping all feature values Experiments are performed on a PC with Intel(R) Core(TM)
to the range [0, 1] according to Eq. (4). i5-6400 CPU at 2.71GHZ with 8 GB of RAM and running on
Windows 10. Our approach was implemented in MATLAB,
xi −Min
xi = , (4) and the SVM classifier is applied with the LIBSVM pack-
Max − Min age [37] (MATLAB version 3.22). Our dataset is processed
Where xi denotes each data point, Min denotes the minimum in python language. The RBF kernel is used as the SVM
value from all data points, and Max denotes the maximum classifier, and k-fold cross-validation is applied to search
value from all data points for each feature. for the best parameter of the RBF kernel. The performance
evaluation of our approach based on the NSL-KDD dataset is
F. EVALUATION METRICS performed in two ways as follows:
We use NSL-KDD (KDDTrain+ and KDDTest+) to verify • Training (KDDTrain+) and testing (KDDTest+) data
the superiority of our approach in improving the SVM clas- are used separately for training and testing.
sification results for network intrusion detection. All perfor- • Ten-fold cross-validation is performed on KDDTrain+
mance metrics are used to measure the performance of our for training and testing.

VOLUME 6, 2018 52849


M. Al-Qatf et al.: Deep Learning Approach Combining SAE With SVM

Our experiments are conducted to study the performance Finally, we compare the performance of our approach
efficiency and verify the effectiveness of the low-dimensional with that of existing methods, such as naive Bayesian, RF,
features extracted by our approach for binary (normal, multi-layer perceptron, SVM, and shallow machine learning,
anomaly) and multiclass (normal, DoS, R2L, U2R, and as mentioned in [22] and [35] and several recent approaches.
Probe) classification based on the NSL-KDD dataset. Fur-
thermore, the training and testing times are calculated to A. EVALUATION THE IMPACT OF THE LOW-DIMENSIONAL
evaluate the efficiency of our model. In addition, we also FEATURES ON THE BINARY CLASSIFICATION
focused on addressing intrusion detection system require- 1) EVALUATION BASED ON TESTING DATA
ments that have faster and lower computational costs by Training and testing data are used separately for training and
reducing computational complexity and storage require- testing when we evaluate the effectiveness of the low-dimen-
ments of SVM classier. To achieve these requirements, sional features extracted by our approach for two-category
we focused on the extraction of the good data representa- classification. Figure 3 shows the experimental results. Our
tion and low-dimensional features from raw data, feeding STL-IDS performs better than single SVM. The accuracy,
it into SVM classifier for reducing the number of support pre-cision, recall, and f-measure values for single SVM are
vectors(nSVs) of SVM because non-linear kernels require 79.42%, 92.59%, 69.40%, and 79.42%, respectively. The
memory and computation that grow linearly proportional to cor-responding values for STL-IDS are 84.96%, 96.23%,
the SVs [38]. Generally, the storage requirements and com- 76.57%, and 85.28%, respectively. However, STL-IDS per-
putational complexity of SVM grows linearly proportional to forms better in all performance metrics compared with single
the number of SVs they have [39]. SVM. The experimental results also show that the proposed
Thus, because the storage requirements and computational approach STL-IDS reduces training and testing times of
complexity of SVM with RBF kernel depend on both input SVM, as shown in Table 3.
dimensionality (d) and the number of support vectors (nSV),
as discussed in Section III-C, our model has reduced the
storage requirements and computational complexity of SVM
compared to SVM alone because its capability to achieve low-
dimensional representation from raw data and less support
vector number of SVM need to be stored as shown in the
Table 3-Column nSV, Table 3-Column nSV, Table 4-Column
nSV, Table 5-Column nSV, Table 6-Column nSV.

TABLE 3. Training and testing time and number of support vectors (NSV)
comparison for STL-IDS and single SVM for binary classification based on
testing data.

FIGURE 3. Accuracy, precision, recall, and F-measure values for STL-IDS


and single SVM for binary classification based on test data.

TABLE 4. training and testing time and number of support vectors (NSV) 2) EVALUATION BASED ON TRAINING DATA
comparison for STL-IDS and single SVM for binary classification based on In this section, we use 10-fold cross validation to evaluate the
training data.
superiority of our proposed model through a comparison of its
performance metrics and training and testing times with those
of single SVM. Figure 4 shows that the performance of our
STL-IDS in binary classification is higher than that of single
SVM. The accuracy, precision, recall, and f-measure values
are 99.416%, 99.45%, 99.291%, and 99.373%, respectively,
whereas single SVM achieves 99.35%, 98.98%, 99.62%, and
TABLE 5. Training and testing time and number of support vectors (NSV) 99.30%, respectively. STL-IDS performs better than single
comparison for STL-IDS and single SVM for five-category classification
based on testing data.
SVM in all performance metrics, except for recall. The recall
values for STL-IDS and single SVM are 99.29% and 99.62%,
respectively. Moreover, TABLE 4 shows that our approach
can reduce the training and testing times (Table 4) of SVM,
which is crucial for measuring the efficiency of network
security applications.

52850 VOLUME 6, 2018


M. Al-Qatf et al.: Deep Learning Approach Combining SAE With SVM

feature learning and dimensionality reduction mechanisms of


the raw dataset. Figure 6 provides a comparison of the perfor-
mance metrics of our model and single SVM. The accuracy,
precision, and f-measure values for STL-IDS are 99.396%,
99.56%, and 99.34%, respectively, whereas those for sin-
gle SVM are 99.346%, 99.061%, and 99.288, respectively.
STL-IDS has a lower recall value than single SVM. The
recall values are 99.122% and 99.518% for STL-IDS and sin-
gle SVM, respectively. However, STL-IDS for five-category
classification is better than single SVM with regard to perfor-
mance metrics and processing time. Table 6 shows that the
training and testing times of STL-IDS are less than those of
single SVM, indicating that our model is more efficient than
FIGURE 4. Accuracy, precision, recall, and F-measure values for STL-IDS
and single SVM for binary classification based on training data. single SVM in all situations.

B. EVALUATION THE IMPACT OF THE LOW-DIMENSIONAL


FEATURES ON THE MULTICLASS CLASSIFICATION
1) EVALUATION BASED ON TESTING DATA
To verify the superiority of our proposed approach for net-
work intrusion detection, we measure the performance of
our model in five-category classification using testing data
and compare it with that of single SVM in terms of perfor-
mance metrics and training and testing times. Figure 5 shows
a comparison between STL-IDS and single SVM. All the
performance metric values of STL-IDS are higher than
those of single SVM. The accuracy, precision, recall, and
f-measure values for STL-IDS are 80.48%, 93.92%, 68.28%, FIGURE 6. Accuracy, precision, recall, and F-measure values for STL-IDS
and single SVM for five-category classification based on training data.
and 79.078%, respectively, whereas those for single SVM
are only 76.76%, 92.98%, 61.85%, and 74.28%, respectively.
Moreover, Table 5 shows that the training and testing times
of our proposed model are less than those of single SVM, TABLE 6. Training and testing time and number of support vectors (NSV)
comparison for STL-IDS and single SVM for five-category classification
indicating that our model is more concise and efficient than based on training data.
single SVM.

C. THE EFFECT OF HYPER-PARAMETERS AND HIDDEN


UNITS NUMBER SETTING IN OUR MODEL EFFICIENCY
Based on the theoretical analysis of the SAE and STL,
it shows that the hidden unit number and the sparse param-
eter are the main parameters influencing the classification
accuracy and training time speed. Thus, the hyper-parameters
optimization is a crucial challenge for developing and design-
ing an effective deep learning model for network intrusion
FIGURE 5. Accuracy, precision, recall, and F-measure values for STL-IDS detection. In addition, how to investigate the effect of hidden
and single SVM for five-category classification based on test data. units number and the sparse parameter on the performance
of our model and decrease the training and testing time is
another challenge.
2) EVALUATION BASED ON TRAINING DATA We were able to increase the performance of the Sin-
Similarly, we apply 10-fold cross-validation KDDTrain+ to gle SVM with k-fold cross-validation strategy to search for
evaluate the performance of our model for five-category the best parameters of the RBF kernel (C = 5.6569 and
classification. We also compare it with single SVM without Gamma = 1.0667). Also, for the hyper-parameters tuning for

VOLUME 6, 2018 52851


M. Al-Qatf et al.: Deep Learning Approach Combining SAE With SVM

sparse autoencoder, we use cross-validation folds strategy on


KDDTrain+ part of NSL-KDD dataset. After the best value
of hyper-parameters is selected, our model is trained and
tested again 10-cross validation on KDDTrain+ with the best
values. Also, it was trained and tested with KDDTrain+ and
KDDTest+, respectively. Optimization process of our model
was performed over key hyper-parameters and their values are
given in Table 7 and Table 8. In our experiments, our model
gets a higher accuracy for binary classification, when p =
0.50, λ = 0.000001, β = 3, and epochs number = 1000. Our
FIGURE 7. The accuracy and testing time on KDDTEST+ dataset in the
model gets a higher accuracy for multiclass classification, binary classification with different number of hidden units.
when p is 0.77, λ is 0.000005, β is 3, and epochs number
is 500.

TABLE 7. The tested values of hyper-parameters for our model for


five-category classification.

FIGURE 8. The accuracy and testing time on KDDTEST+ dataset in the


multiclass classification with different number of hidden units.

TABLE 8. The tested values of hyper-parameters for our model for binary
classification.

FIGURE 9. The training time on KDDTRAIN+ dataset in the binary and


multiclass classification with different number of hidden units.

In other experiments, we show how we investigate the


sparse autoencoder ability for feature learning and dimen- our experiments. We use the best values in TABLE 7 and
sionality reduction of our data for enhancing the accuracy TABLE 8 to other hyper-parameters in our model. The num-
and decreasing the training and testing times of SVM. Thus, ber of hidden units are 30 and 13 for binary and multiclass
we restrict the number of hidden layer units to be less than classification, respectively. As shown in Figure 10 and
the original input units and we can get a compressed repre- Figure 11, the optimal classification accuracy is obtained
sentation, which actually achieves the desired dimensionality when the sparse parameter equals to 0.77 and 0.50 for the
reduction effect. In our experiments, our model gets a higher binary and multiclass classification, respectively.
accuracy, when the number of hidden units is 30 and epochs In this section, we explored how to exploit the impact of the
number is 1000. Less training and testing times of SVM sparse parameter and the hidden unit number on our model to
compared with Single SVM, which we discussed in above obtain a higher classification accuracy and a lower training
subsections, almost when the number of hidden units are times.
less than half of the number of the original input units (less
of 60 hidden units). Figure 7, Figure 8, and Figure 9 show the D. DISCUSSION AND ADDITIONAL COMPARISONS
test classification accuracy of our model, training and testing We also verify our model’s superiority by comparing its
times of SVM with different numbers of hidden units. detection accuracy with that obtained from other classifica-
Finally, to test the effect of the sparse parameter on tion algorithms in related studies. Yin et al. [22] claimed that
the classification accuracy and training time speed. We use their model, which was constructed with recurrent neural net-
KDDTrain+ as our train data and KDDTest+ for testing for work and soft-max classifier and applied on KDDTest+ for

52852 VOLUME 6, 2018


M. Al-Qatf et al.: Deep Learning Approach Combining SAE With SVM

TABLE 9. Additional performance comparisons with several related approaches in the binary classification.

FIGURE 10. Accuracy on KDDTEST+ dataset in the binary and multiclass FIGURE 11. The training time on KDDTRAIN+ dataset in the binary and
classification with different sparse parameer. multiclass classification with different sparse parameter.

evaluation, obtained 83.28% and 81.29% detection accuracy


for two-category and five-category classification, respec- terms of accuracy in two-category classification since we
tively. The authors compared the results of their model with improved SVM with a suitable learning algorithm (SAE) in
those of many classification algorithms discussed in [22] the STL approach. Moreover, our model is a competitor in
and [35], as demonstrated in Table 9 and Table 10. Table 9 and terms of time complexity since we used the SAE technique
Table 10 show that our model is a competitor of [22] in for dimensionality reduction and data representation.

VOLUME 6, 2018 52853


M. Al-Qatf et al.: Deep Learning Approach Combining SAE With SVM

TABLE 10. Additional performance comparisons with several related approaches in the multiclass classification.

Wang et al. [23] also claimed that their model achieved Javaid et al. [2] claimed that their deep learning approach
an accuracy of 99.31% for binary classification on the basis for network intrusion detection, which depends on STL
of 10-fold cross validation for KDDTrain+ (training data part that combines SAE with soft-max, obtains an accuracy
of the NSL-KDD dataset). of 88.39% and 79.10% for two-category and five-category
Table 9 demonstrates that our model achieves better accu- classification, respectively. Their approach was evaluated
racy compared with [23] model. However, their model per- on KDDTest+ (the testing data of NSL-KDD dataset).
forms better in terms of training time. Wang et al. [23] used Their model obtained an accuracy of less than 99%
logarithm marginal density ratio transformation (LMDRT) to for two-category and five-category classification based on
verify their model’s efficiency in enhancing the accuracy and KDDTrain+, which was evaluated using 10-fold cross-
reducing the training time of SVM. The weakness of their validation. The experimental results show that our method
method is that the reduction in testing time is not considered, outperforms the model in [2] by 1.38% in terms of detection
unlike in our proposed method. Yousefi-Azar et al. [10] accuracy rate when applied on KDDTrain+ and KDDTest+
evaluated their feature learning model based on deep AE, and separately for training and testing for five-category classi-
they applied their model with many shallow machine learning fication. For the KDDTrain+ dataset, our method achieves
algorithms, such as SVM. better results than [2]. Its accuracy for two-category and
However, the highest accuracy they achieved was 83.30%. five-category classification is 99.423% and 99.414%, respec-
As demonstrated in Table 9, our model is superior to their tively. This experimental evidence and the comparison of our
model. Moreover, their model is computationally expensive method and with that of [2] demonstrate that our method
because it contains many hidden layers and involves two achieves better results than [2] in terms of detection accu-
training stages. racy rate and time complexity because we used good feature

52854 VOLUME 6, 2018


M. Al-Qatf et al.: Deep Learning Approach Combining SAE With SVM

learning algorithm (SAE) with strong classifier (SVM) REFERENCES


instead of weak classifier (soft-max). We used SAE not only [1] M. Ahmed, A. N. Mahmood, and J. Hu, ‘‘A survey of network anomaly
for feature learning that produces a new good representa- detection techniques,’’ J. Netw. Comput. Appl., vol. 60, pp. 19–31,
Jan. 2016.
tion of our dataset that leads to a good performance but [2] A. Javaid, Q. Niyaz, W. Sun, and M. Alam, ‘‘A deep learning approach
also for the dimensionality reduction that leads to obtain the for network intrusion detection system,’’ in Proc. 9th EAI Int. Conf.
low-dimensional features that improve SVM classification Bio-Inspired Inf. Commun. Technol. (Formerly BIONETICS), 2016,
pp. 21–26.
accuracy and decrease the training and testing times of the
[3] N. Shone, T. N. Ngoc, V. D. Phai, and Q. Shi, ‘‘A deep learning approach to
algorithm. In order to compare the performance of STL-IDS network intrusion detection,’’ IEEE Trans. Emerg. Topics Comput. Intell.,
with some recent related works which evaluated with KDD- vol. 2, no. 1, pp. 41–50, Feb. 2018.
CUP’99 dataset, we evaluated our proposed model on the [4] B. Dong and X. Wang, ‘‘Comparison deep learning method to traditional
methods using for network intrusion detection,’’ in Proc. 8th IEEE Int.
KDD-CUP’99 dataset. We used The 10%KDDCup dataset Conf. Commun. Softw. Netw. (ICCSN), Jun. 2016, pp. 581–585.
contains 494,021 samples for training phase and corrected [5] L.-S. Chen and J.-S. Syu, ‘‘Feature extraction based approaches for
labels KDDCup 99 dataset contains 311029 samples for improving the performance of intrusion detection systems,’’ in Proc. Int.
MultiConf. Eng. Comput. Scientists, vol. 1, Mar. 2015, pp. 1–6.
testing phase. After we removed all duplicate samples in [6] S. Seo, S. Park, and J. Kim, ‘‘Improvement of network intrusion detection
both dataset to avoid the bias towards more frequent sam- accuracy by using restricted Boltzmann machine,’’ in Proc. 8th Int. Conf.
ples, the training and testing datasets consist of 145,586 and Comput. Intell. Commun. Netw. (CICN), Dec. 2016, pp. 413–417.
77,291 instances, respectively. The results of this experiment [7] R. Salakhutdinov and H. Larochelle, ‘‘Efficient learning of deep Boltz-
mann machines,’’ in Proc. 13th Int. Conf. Artif. Intell. Statist., 2010,
show that our model has achieved higher accuracy rate than pp. 693–700.
most of the existing methods on the same dataset. [8] M. Z. Alom, V. R. Bontupalli, and T. M. Taha, ‘‘Intrusion detection using
The accuracy of our model is lower than that of DAE- deep belief networks,’’ in Proc. Nat. Aerosp. Electron. Conf. (NAECON),
Jun. 2015, pp. 339–344.
IDS [27], and the gap remains within 0.7% and 1.4% for [9] M. A. Salama, H. F. Eid, R. A. Ramadan, A. Darwish, and A. E. Hassanien,
multiclass and binary classification, respectively. This indi- ‘‘Hybrid intelligent intrusion detection scheme BT,’’ in Soft Computing in
cates that our model has reached or exceeded the average Industrial Applications. Berlin, Germany: Springer, 2011, pp. 293–303.
[10] M. Yousefi-Azar, V. Varadharajan, L. Hamey, and U. Tupakula,
overall accuracy level of other state-of-the-art approaches and
‘‘Autoencoder-based feature learning for cyber security applications,’’ in
methods. Moreover, Table 9 and Table 10 demonstrate that the Proc. Int. Joint Conf. Neural Netw. (IJCNN), May 2017, pp. 3854–3861.
performance of our proposed method is very close to or more [11] G. Kim, S. Lee, and S. Kim, ‘‘A novel hybrid intrusion detection method
than other state-of-the-art approaches in terms of accuracy integrating anomaly detection with misuse detection,’’ Expert Syst. Appl.,
vol. 41, no. 4, pp. 1690–1700, 2014.
rate. [12] Y. Liao and V. R. Vemuri, ‘‘Use of K -nearest neighbor classifier for
intrusion detection,’’ Comput. Secur., vol. 21, no. 5, pp. 439–448, 2002.
[13] L. Koc, T. A. Mazzuchi, and S. Sarkani, ‘‘A network intrusion detection
V. CONCLUSION AND FUTURE WORK system based on a hidden Naïve Bayes multiclass classifier,’’ Expert Syst.
The proposed approach is another means of utilizing the STL Appl., vol. 39, no. 18, pp. 13492–13500, 2012.
framework based on SAE for feature learning and dimension- [14] S. Mukherjee and N. Sharma, ‘‘Intrusion detection using naive bayes
classifier with feature reduction,’’ Procedia Technol., vol. 4, pp. 119–128,
ality reduction and using SVM instead of soft-max Feb. 2012.
for classification. The experimental results of the proposed [15] E. de la Hoz, E. de la Hoz, A. Ortiz, J. Ortega, and A. Martínez-Álvarez,
approach show that our model demonstrates improved SVM ‘‘Feature selection by multi-objective optimisation: Application to network
anomaly detection by hierarchical self-organising maps,’’ Knowl.-Based
classification accuracy and accelerated training and testing Syst., vol. 71, pp. 322–338, Nov. 2014.
times. It also exhibits good performance in two-category and [16] E. De la Hoz, E. De La Hoz, A. Ortiz, J. Ortega, and B. Prieto, ‘‘PCA
five-category classification. Compared with other previous filtering and probabilistic SOM for network intrusion detection,’’ Neuro-
models and shallow classification methods, such as J48, naive computing, vol. 164, pp. 71–81, Sep. 2015.
[17] R. Sen, M. Chattopadhyay, and N. Sen, ‘‘An efficient approach to develop
Bayesian, RF, and SVM, our approach achieved a higher an intrusion detection system based on multi layer backpropagation neural
accuracy rate particularly under five-category classification, network algorithm: IDS using BPNN algorithm,’’ in Proc. ACM SIGMIS
on the NSL-KDD dataset. The future expansion of our pro- Conf. Comput. People Res., 2015, pp. 105–108.
[18] T. Mehmood and H. B. M. Rais, ‘‘SVM for network anomaly detection
posed approach will focus on further improvement by using using ACO feature subset,’’ in Proc. Int. Symp. Math. Sci. Comput. Res.
multiple stages of STL and a hybrid feature learning model (iSMSC), May 2015, pp. 121–126.
for good representation features and dimensionality reduction [19] S. Mukkamala, G. Janoski, and A. Sung, ‘‘Intrusion detection using neural
networks and support vector machines,’’ in Proc. Int. Joint Conf. Neural
mechanisms. Additionally, the model’s training and Netw. (IJCNN), vol. 2, May 2002, pp. 1702–1707.
testing times can be further reduced by the implementation [20] G. Kou, Y. Peng, Z. Chen, and Y. Shi, ‘‘Multiple criteria mathematical
of the system in parallel platforms or GPU acceleration. programming for multi-class classification and application in network
intrusion detection,’’ Inf. Sci., vol. 179, no. 4, pp. 371–381, 2009.
[21] R. A. R. Ashfaq, X.-Z. Wang, J. Z. Huang, H. Abbas, and
ACKNOWLEDGEMENTS Y.-L. He, ‘‘Fuzziness based semi-supervised learning approach for
The authors would like to thank the anonymous referees for intrusion detection system,’’ Inf. Sci., vol. 378, pp. 484–497, Feb. 2017.
[22] C. Yin, Y. Zhu, J. Fei, and X. He, ‘‘A deep learning approach for intru-
their helpful comments and suggestions. sion detection using recurrent neural networks,’’ IEEE Access, vol. 5,
pp. 21954–21961, 2017.
[23] H. Wang, J. Gu, and S. Wang, ‘‘An effective intrusion detection framework
CONFLICTS OF INTEREST based on SVM with feature augmentation,’’ Knowl.-Based Syst., vol. 136,
The authors declare no conflict of interest. pp. 130–139, Nov. 2017.

VOLUME 6, 2018 52855


M. Al-Qatf et al.: Deep Learning Approach Combining SAE With SVM

[24] D. Perez, M. A. Astor, D. P. Abreu, and E. Scalise, ‘‘Intrusion detection in MAJJED AL-QATF received the B.S. degree in
computer networks using hybrid machine learning techniques,’’ in Proc. network technology and computer security from
43rd Latin Amer. Comput. Conf. (CLEI), Sep. 2017, pp. 1–10. Sana’a University, Sana’a, Yemen, in 2013. He is
[25] A. A. Diro and N. Chilamkurti, ‘‘Distributed attack detection scheme using currently pursuing the M.S. degree in computer
deep learning approach for Internet of Things,’’ Future Gener. Comput. science with the School of Information Sci-
Syst., vol. 82, pp. 761–768, May 2018. ence and Engineering, Central South University,
[26] W. Wang et al., ‘‘HAST-IDS: Learning hierarchical spatial-temporal fea- Changsha, China. His research interests include
tures using deep neural networks to improve intrusion detection,’’ IEEE
deep learning, information security, and data
Access, vol. 6, pp. 1792–1806, 2018.
mining.
[27] F. Farahnakian and J. Heikkonen, ‘‘A deep auto-encoder based approach
for intrusion detection system,’’ in Proc. 20th Int. Conf. Adv. Commun.
Technol. (ICACT), Feb. 2018, p. 1.
[28] P. Madani and N. Vlajic, ‘‘Robustness of deep autoencoder in intrusion
detection under adversarial contamination,’’ in Proc. 5th Annu. Symp.
Bootcamp Hot Topics Sci. Secur., 2018, Art. no. 1.
YU LASHENG received the B.Sc. degree in com-
[29] R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng, ‘‘Self-taught learning:
Transfer learning from unlabeled data,’’ in Proc. 24th Int. Conf. Mach. puter science and the M.S. and Ph.D. degrees
Learn., 2007, pp. 759–766. in control theory and control engineering from
[30] A. Coates, A. Ng, and H. Lee, ‘‘An analysis of single-layer networks in Central South University, China. He is currently
unsupervised feature learning,’’ in Proc. 14th Int. Conf. Artif. Intell. Statist., a Vice Professor with Central South University.
2011, pp. 215–223. He has authored at least 70 papers on agent tech-
[31] A. Ng, ‘‘Sparse autoencoder. CS294A lecture notes,’’ Stanford Univ., nologies or algorithms and three books. He has
Stanford, CA, USA, Tech. Rep. 72, 2011. organized and implemented many projects that
[32] H. Liu, T. Taniguchi, T. Takano, Y. Tanaka, K. Takenaka, and T. Bando, have greatly benefitted our society. His main
‘‘Visualization of driving behavior using deep sparse autoencoder,’’ in research interests include smart computing, agent
Proc. IEEE Intell. Vehicles Symp., Jun. 2014, pp. 1427–1434. technologies and applications, structure and algorithm, and distributed com-
[33] J. Deng, Z. Zhang, E. Marchi, and B. Schuller, ‘‘Sparse autoencoder- puting. He is an ACM and CCF Member, and an ACM/ICPC Golden Medal
based feature transfer learning for speech emotion recognition,’’ in Coach. He is an Editor of the Journal of Convergence Information Technol-
Proc. Humaine Assoc. Conf. Affect. Comput. Intell. Interact., Sep. 2013, ogy and the Advances in Information Sciences and Service Sciences. He is
pp. 511–516.
also a reviewer for Future Generation Computer Systems, Journal of Parallel
[34] C.-F. Tsai, Y.-F. Hsu, C.-Y. Lin, and W.-Y. Lin, ‘‘Intrusion detection
and Distributed Computing, Artificial Intelligence Review, and some other
by machine learning: A review,’’ Expert Syst. Appl., vol. 36, no. 10,
pp. 11994–12000, 2009. journals and conferences.
[35] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, ‘‘A detailed analysis
of the KDD CUP 99 data set,’’ in Proc. IEEE Symp. Comput. Intell. Secur.
Defense Appl., Jul. 2009, pp. 1–6.
[36] N. Paulauskas and J. Auskalnis, ‘‘Analysis of data pre-processing influence MOHAMMED AL-HABIB received the B.S.
on intrusion detection using NSL-KDD dataset,’’ in Proc. Open Conf. degree in computer sciences and information sys-
Elect. Electron. Inf. Sci. (eStream), Apr. 2017, pp. 1–5. tems from Thamar University, Thamar, Yemen,
[37] C. L. C. Chang. Libsvm. Accessed: Jan. 5, 2018. [Online]. Available:
in 2011. He is currently pursuing the M.S. degree
http://www.csie.ntu.edu.tw/~cjlin/libsvm/
in computer science with the School of Informa-
[38] S. Maji, A. C. Berg, and J. Malik, ‘‘Classification using intersection kernel
support vector machines is efficient,’’ in Proc. IEEE Conf. Comput. Vis. tion Science and Engineering, Central South Uni-
Pattern Recognit., Jun. 2008, pp. 1–8. versity, Changsha, China. His research interests
[39] P. Ilayaraja, N. V. Neeba, and C. V. Jawahar, ‘‘Efficient implementation of include deep learning, computer vision, and data
SVM for large class problems,’’ in Proc. 19th Int. Conf. Pattern Recognit., mining.
Dec. 2008, pp. 1–4.
[40] W. L. Al-Yaseen, Z. A. Othman, and M. Z. A. Nazri, ‘‘Multi-level hybrid
support vector machine and extreme learning machine based on modi-
fied k-means for intrusion detection system,’’ Expert Syst. Appl., vol. 67,
pp. 296–303, Jan. 2017. KAMAL AL-SABAHI received the B.S. degree in
[41] Y. Li, R. Ma, and R. Jiao, ‘‘A hybrid malicious code detection method based
computer science from Sana’a University, Sana’a,
on deep learning,’’ Methods, vol. 9, no. 5, pp. 205–216, 2015.
Yemen, in 2008, and the M.S. degree in infor-
[42] N. Gao, L. Gao, Q. Gao, and H. Wang, ‘‘An intrusion detection model based
on deep belief networks,’’ in Proc. 2nd Int. Conf. Adv. Cloud Big Data, mation technology from OUM University, Kuala
Nov. 2014, pp. 247–252. Lumpur, Malaysia, in 2015. He is currently pursu-
[43] K. Alrawashdeh and C. Purdy, ‘‘Toward an online anomaly intrusion ing the Ph.D. degree in computer science with the
detection system based on deep learning,’’ in Proc. 15th IEEE Int. School of Information Science and Engineering,
Conf. Mach. Learn. Appl. (ICMLA), Anaheim, CA, USA, Feb. 2017, Central South University, Changsha, China. His
pp. 195–200. research interests include deep learning, natural
[44] R. C. Staudemeyer, ‘‘Applying long short-term memory recurrent neural language processing, knowledge engineering, and
networks to intrusion detection,’’ South Afr. Comput. J., vol. 56, no. 1, data mining.
pp. 136–154, 2015.

52856 VOLUME 6, 2018

View publication stats

You might also like