DeepLearningApproachCombainingsparseautoencoderwithSVMForNetworkintrusiondetection
DeepLearningApproachCombainingsparseautoencoderwithSVMForNetworkintrusiondetection
net/publication/327618271
CITATIONS READS
537 2,283
4 authors:
All content following this page was uploaded by Majjed Alqatf on 21 October 2018.
ABSTRACT Network intrusion detection systems (NIDSs) provide a better solution to network security
than other traditional network defense technologies, such as firewall systems. The success of NIDS is highly
dependent on the performance of the algorithms and improvement methods used to increase the classification
accuracy and decrease the training and testing times of the algorithms. We propose an effective deep learning
approach, self-taught learning (STL)-IDS, based on the STL framework. The proposed approach is used
for feature learning and dimensionality reduction. It reduces training and testing time considerably and
effectively improves the prediction accuracy of support vector machines (SVM) with regard to attacks. The
proposed model is built using the sparse autoencoder mechanism, which is an effective learning algorithm for
reconstructing a new feature representation in an unsupervised manner. After the pre-training stage, the new
features are fed into the SVM algorithm to improve its detection capability for intrusion and classification
accuracy. Moreover, the efficiency of the approach in binary and multiclass classification is studied and
compared with that of shallow classification methods, such as J48, naive Bayesian, random forest, and SVM.
Results show that our approach has accelerated SVM training and testing times and performed better than
most of the previous approaches in terms of performance metrics in binary and multiclass classification. The
proposed STL-IDS approach improves network intrusion detection and provides a new research method for
intrusion detection.
INDEX TERMS Network security, network intrusion detection system, deep learning, sparse autoencoder,
SVM, self-taught learning, NSL-KDD.
2169-3536 2018 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 6, 2018 Personal use is also permitted, but republication/redistribution requires IEEE permission. 52843
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
M. Al-Qatf et al.: Deep Learning Approach Combining SAE With SVM
credit cards, military applications, and many medical applica- strengths. Better or at least competitive results
tions [1]. The following can be performed to develop an effec- are achieved compared with the results of similar
tive anomaly-based intrusion detection system. First, proper approaches. Moreover, our approach considerably
feature selection based on feature extraction and dimensional- reduces training and testing times of SVM.
ity reduction must be implemented when extracting a subset (3) We use the NSL-KDD dataset to compare the efficiency
of a correlated features from the network traffic dataset to of our approach with single SVM and that of differ-
enhance classification results [2]. Second, the best techniques ent classification algorithms, such as naïve Bayesian,
must be used to help enhance the classification results and random forest, multi-layer perceptron, and many other
increase the classification speed. classification algorithms in related work on binary and
Various supervised and unsupervised machine learning multiclass classification.
techniques can be used or integrated with other algorithms Experimental results show that our approach is suitable
in ADNIDS to enhance intrusion detection performance and for intrusion detection. Its performance is superior to that
increase the classification rate of machine learning algo- of traditional classification algorithms using the NSL-KDD
rithms, such as decision tree, random forest, self-organization dataset and most previous approaches in terms of binary and
maps (SOM), and support vector machine (SVM), which multiclass classification.
have been utilized to detect and classify intrusions. Many The rest of this paper is structured as follows. We briefly
researchers that investigated NIDS focused on using unsu- describe related studies, particularly those that examined
pervised learning techniques followed by shallow machine unsupervised machine learning techniques and SVM-based
learning, such as SVM, random forest, and naïve Bayesian deep learning approaches, in Section II. In Section III, we pro-
[3], [4], because using unsupervised learning techniques vide an overview of our proposed methodology for NIDS
before shallow machine learning offers improvements in implementation, SVM, the NSL-KDD dataset, data process-
detection rate. Unsupervised learning algorithms and dimen- ing, and evaluation metrics. In Section IV, the performance of
sion reduction methods are frequently used in feature extrac- our approach is evaluated based on the experimental results
tion and feature representation to improve data quality [5] and and compared with that of related approaches for NIDS. Our
achieve an improvement in the classification results of shal- conclusions and directions for future research are presented
low and traditional supervised machine learning algorithms. in Section V.
Recently, remarkable achievements in unsupervised deep
learning-based methods were successfully applied in vision II. RELATED WORK
computing applications. In addition, deep learning techniques Network intrusion detection has become the most important
[6]–[10] can be adopted as unsupervised feature learning part of the infrastructure of defense networking systems
methods that help supervised machine learning improve its in information security. Various machine learning algo-
performance and identification of network traffic anomalies rithms or approaches are applied in NIDS to detect and
by reducing the testing and training times. distinguish between normal traffic and anomalies or attacks
Deep learning approaches have a good potential to in network traffic; these approaches include decision
achieve effective data representation for building improved tree [11], k-nearest neighbor (K-NN) [12], naïve Bayes net-
approaches. Therefore, on the basis of a self-taught learning work [13], [14], SOM [15], [16], SVM, and artificial neural
framework and inspired by the combination of the sparse network (ANN) [17]. SVM demonstrates better performance
autoencoder (SAE) with SVM, we propose using self-taught than other traditional machine learning classification tech-
learning (STL) for good data representation and SVM for the niques [18].
classification task. STL is a deep learning approach that is A work proposed by Mukkamala et al. [19] compared the
based on the SAE algorithm. It helps rebuild input represen- performance of SVM and ANN on the KDD CUP 99 dataset.
tation and converts it to feature representation of data related The results showed that the detection results of SVM are
to the input data, thereby improving the performance of the better than those of ANN. In [20], SVM, naïve Bayes, logistic
classification task considerably. The main contributions of regression, decision tree (DT), and classification and regres-
this work are as follows: sion tree (CART) approaches were compared in terms of
(1) We develop a novel deep learning approach STL- intrusion detection classification by using the KDD CUP
IDS (a self-taught learning based intrusion detection 99 dataset. The results showed that SVM has distinct features.
system) based on the STL framework by combin- Ashfaq et al. [21] developed a new method by using the
ing SAE and SVM for network intrusion detection. fuzziness approach based on semi-supervised learning for
We study the potential of our approach to achieve effec- intrusion detection. This method uses a neural network with
tive representation and dimensionality reduction for the random weights and plays an important role in the detec-
improvement of the classification results of shallow and tion rate of NIDS because it decreases the computational
traditional supervised machine learning algorithms, cost. The model was evaluated on the NSL-KDD dataset
such as SVM, in binary and multiclass classification. but the performance of the model was studied on only the
(2) We combine deep and shallow learning techniques binary classification task. In [22], a deep learning model
in our novel approach and exploit their respective based on a recurrent neural network with a soft-max classifier
was presented. The model was evaluated on the NSL-KDD and Gaussian naïve Bayes. The experimental results showed
dataset, and the performance of the model in binary and that the model is an improvement of traditional machine
multiclass classification was studied. The model showed a learning. However, the model is computationally expensive
deep learning capability to model high-dimensional features, because it comprises many hidden layers and entails two
and an improved accuracy rate of intrusion detection was training stages. Diro and Chilamkurti [25] presented IOT/fog
achieved. However, the training time was large. SVM is one network attack detection system. Their system was based on
of the most important traditional machine learning algorithms distributed deep learning. The performance of their model
that depend on statistical learning theory, and it uses struc- was compared against shallow and traditional machine learn-
tural risk minimization to achieve a strong generalization ing approaches. They used deep learning model with three
capability. In addition, SVM presents a constrained quadratic hidden layers for feature learning and soft-max regression
programming problem that requires a large memory and con- (SMR) for the classification task. However, their model is
siderable training time. In addition, the training complexity computationally expensive compared to our approach. Their
of SVM is highly dependent on the size of the dataset. There- model was evaluated on NSL-KDD dataset in both binary
fore, the performance of SVM based on IDS needs to be and multiclass classification. Their method has demonstrated
enhanced, and the training and testing times must be reduced. that the distributed attack detection can better detect attacks
To address these limitations, many studies have improved than centralized detection system because of the parameter
SVM-based IDS by combining SVM with other methods. sharing that can avoid local minima in training. Our approach
In [23], efficient machine learning based on SVM with fea- has a significant difference since we aimed at adopting a
ture augmentation was presented to increase the quality of the new approach to enable the detection of attacks through
SVM classifier. The method improved the intrusion detection centralized detection system. By contrast, their model aimed
rate of SVM and reduced the required training time. The at adopting new approach to enable the detection of the
weakness of this approach is that its detection accuracy is attacks through distributed detection system in social internet
insufficient, and the time factor are not considered. of things.
Many researchers have combined supervised and unsuper- Wang et al. [26] proposed novel intrusion detection
vised learning algorithms to create a model that can increase system called hierarchical spatial-temporal feature-based
the detection rate of supervised machine learning classifiers, intrusion detection system (HAST-IDS); their system used
such as an SVM and random forest. In [24], many unsuper- the deep convolutional neural network for learning the
vised learning algorithms were combined with SVM and a low-level spatial features of network traffic, LSTM networks
neural network (NN) to improve the performance of the intru- (long short-term memory networks) for learning high-level
sion detection system. The authors designed, implemented, temporal features. They used the standard DRAPA and
and evaluated many hybrid models that use principal com- ISCX2012 dataset to evaluate the performance of their pro-
ponent analysis (PCA) or Gradual Feature Reduction (GFR) posed system. Their model is computationally expensive
for feature selection and SVM or NN for classification. The compared to our approach because they used two stages for
results showed that hybrid models can effectively detect feature learning. Farahnakian and Heikkonen [27] proposed
known and unknown attacks, and PCA and GFR feature deep learning approach for intrusion detection. The model
selection techniques are computationally expensive in terms was built using deep autoencoder and trained in a greedy layer
of training and testing times. Unsupervised learning based wise fashion in order to avoid overfitting. The performance of
on deep learning has been used recently in feature extrac- their approach was evaluated on KDDCup 99 (old version of
tion and dimensionality reduction, leading to an increase NSL-KDD); their approach performance was studied on both
in the detection rate and a decrease in the processing time binary and multiclass classification. Although high accuracy
of supervised machine learning algorithms, such as SVM was achieved for intrusion detection task, their approach is
and soft-max. Alom et al. [8] proposed a deep learning computationally expensive compared to our approach.
approach based on stack restricted Boltzmann machine for Madani and Vlajic [28] have studied the viability of using
feature extraction and dimensionality reduction and based on the deep autoencoder in anomalies detection in adaptive
SVM for the classification of network intrusion detection. intrusion detection system under adversarial contamination.
The approach was implemented on merely 40% of the NSL- They used the reconstruction error of the autoencoder as a
KDD training dataset. The approach performed better than measure for anomaly detection and the NSL-KDD dataset
single SVM or single deep belief networks (DBN) and many for performance evaluation. Our approach is significantly
other approaches. In [10], a feature learning model based on different since we used the autoencoder for feature learning
AE was presented to achieve a good representation of differ- and dimensionality reduction. Moreover, we used the perfor-
ent feature sets. This feature learning model was applied to mance metrics, training time, and testing time to evaluate the
malware classification and anomaly-based network intrusion performance of our model for anomaly detection.
detection by using the NSL-KDD dataset. The topology of In [2], NIDS based on unsupervised deep learning tech-
the used AE was different from the common topology, and niques was developed using SAE for feature learning and
the extracted feature by the AE was applied to many tra- soft-max regression (SMR) for classification. Evaluation
ditional machine learning algorithms, such as SVM, K-NN, was based on all performance metrics on the NSL-KDD
representation is related to unlabeled data. In the second and bias vectors b1 ∈ RK ×1 and b2 ∈ RN ×1 , which attempt
stage, the new representation is combined with labeled data, to learn and reconstruct its output values b xi to be equal to its
xl , then any supervised algorithm, such as SVM, can be used inputs xi . In other words, an approximation to the identity
for the classification task. function is learned to make the output values similar to the
Different methods can used for UFL [30]; for our model, input values; that is, it uses y(i) = x (i) [3], [31]. The activation
we adopt SAE which is an unsupervised learning algorithm function is chosen to be the sigmoid function, g (z) = 1+e1 −z ,
that consists only of a single hidden layer. It can be used and its output range is [0,1]. It is used for the activation
for feature learning and dimensionality reduction instead of (hW , b) of the nodes in the hidden and output layers are
PCA to achieve a significantly nonlinear generalization. Its shown in (1a).
input and output layers have the same number of units. The
m
input and output layers contain N units, and the hidden layer 1 X
T = xi k2
k xi − b
contains K units. As shown in Figure 2(a), the output values 2m
i=1
xi in the output layer is similar to the input values xi in the
b
input layer. λ X X X X
+ W2 + V2 + b21 + Wb2 2
2 n
k,n n,k k
k
X
+β KL(ρ k pbj ) (2)
j=1
In the recent years, there is increasing attention to the where ρ is a sparsity constraint parameter that ranges from
study of single-layer SAE as a feature learning and dimen- 0 to 1 and β controls the sparsity penalty term. KL ρ k pbj
sionality reduction method. The SAE can learn effective attains a minimum value when ρ = pbj , where denotes pbj
low-dimensional features from the raw data and make it easier the average activation value of hidden unit j over all training
to extract efficient and appropriate low-dimensional features inputs x. After learning the optimal values for W and b1 by
automatically for the classification process. applying SAE on unlabeled data xu , we evaluate the feature
Feature extraction and dimensionality reduction process representation a = h for labeled data (xl , y). We use this new
in SAE involves two steps: encoding and decoding. The feature representation, h, with the label vector, y, in SVM for
encoding step maps the input data xi into the hidden units’ the classification task in the second stage of STL, as shown
representations, as shown in (1a): in Figure 2(b). Figure 2 shows an architectural diagram of
the proposed STL. We apply STL based on SAE for good
h = f (X ) = g(WX + b1 ) (1a)
data representation because of its simple and straightforward
The encoding step maps the hidden units’ representations into implementation and its capability to learn the original expres-
the reconstructed data, as shown in (1b): sions and structures of data. The wide application of STL
extends particularly to image identification [32], [33], SVM
Z = g (Vh + b2 ) (1b)
for classification tasks, and distinguishing different types
In the above equations, X = (x1 , x2 , x3 , x4 , . . . ., xi ) is the of intrusions because combining robust classifiers, such as
high-dimensional input data vector,Z = (b x1 , xb2 , x\
3,...... , xb
m) SVM, with SAE leads to enhanced performance in intrusion
is the reconstruction vector of the input data and h = detection. Furthermore, the features extracted from the SAE
(h1 , h1 , h1 , . . . ., hk ) is the low-dimensional vector output algorithm are passed to the SVM classifier for intrusion
from the hidden layer. detection. The performance accuracy rate of our method is
SAE applies backpropagation algorithm to obtain the opti- better than that of SVM alone, and the training and testing
mal values for its weight matrices W ∈ RK ×N and V ∈ RN ×K times of SVM are reduced.
TABLE 2. Feature details of the NSL-KDD dataset. proposed approach. The attribute values that resulted from
the training and testing processes of the NSL-KDD dataset
are used to calculate these performance metrics. The values
can be defined as follows:
• True positive (TP): anomaly instances correctly classi-
fied as an anomaly.
• False positive (FP): normal instances wrongly classified
as an anomaly.
• True negative (TN): normal instances correctly classi-
fied as normal.
• False negative (FN): anomaly instances wrongly classi-
fied as normal.
Then, we compute the performance metrics from the follow-
ing notations.
• Accuracy (AC): indicates the proportion of correct clas-
sifications of the total records in the testing set, as shown
in (5).
TP + TN
AC = (5)
TP + TN + FP + FN
• Precision (P): indicates the proportion of correct pre-
dictions of intrusions divided by the total of predicted
intrusions in the testing process, as shown in (6).
TP
p= (6)
TP + FP
• Recall (R): indicates the proportion of correct predic-
tions of intrusions divided by the total of actual intrusion
instances in the testing set, as shown in (7).
TP
R= (7)
TP + FN
• F-measure (F): is considered the most important metric
of network intrusion detection that represents both pre-
cision (P) and recall (R), as shown in (8).
2∗P∗R
F= (8)
P+R
IV. PERFORMANCE EVALUATION: IMPACT OF THE
thereby making the feature values incomparable and unsuit- LOW-DIMENSIONAL FEATURES AND DIFFERENT HIDDEN
able for processing. Hence, these features are normalized by UNITS AND SPARSITY PARAMETER ON SVM CLASSIFIER
using max–min normalization for mapping all feature values Experiments are performed on a PC with Intel(R) Core(TM)
to the range [0, 1] according to Eq. (4). i5-6400 CPU at 2.71GHZ with 8 GB of RAM and running on
Windows 10. Our approach was implemented in MATLAB,
xi −Min
xi = , (4) and the SVM classifier is applied with the LIBSVM pack-
Max − Min age [37] (MATLAB version 3.22). Our dataset is processed
Where xi denotes each data point, Min denotes the minimum in python language. The RBF kernel is used as the SVM
value from all data points, and Max denotes the maximum classifier, and k-fold cross-validation is applied to search
value from all data points for each feature. for the best parameter of the RBF kernel. The performance
evaluation of our approach based on the NSL-KDD dataset is
F. EVALUATION METRICS performed in two ways as follows:
We use NSL-KDD (KDDTrain+ and KDDTest+) to verify • Training (KDDTrain+) and testing (KDDTest+) data
the superiority of our approach in improving the SVM clas- are used separately for training and testing.
sification results for network intrusion detection. All perfor- • Ten-fold cross-validation is performed on KDDTrain+
mance metrics are used to measure the performance of our for training and testing.
Our experiments are conducted to study the performance Finally, we compare the performance of our approach
efficiency and verify the effectiveness of the low-dimensional with that of existing methods, such as naive Bayesian, RF,
features extracted by our approach for binary (normal, multi-layer perceptron, SVM, and shallow machine learning,
anomaly) and multiclass (normal, DoS, R2L, U2R, and as mentioned in [22] and [35] and several recent approaches.
Probe) classification based on the NSL-KDD dataset. Fur-
thermore, the training and testing times are calculated to A. EVALUATION THE IMPACT OF THE LOW-DIMENSIONAL
evaluate the efficiency of our model. In addition, we also FEATURES ON THE BINARY CLASSIFICATION
focused on addressing intrusion detection system require- 1) EVALUATION BASED ON TESTING DATA
ments that have faster and lower computational costs by Training and testing data are used separately for training and
reducing computational complexity and storage require- testing when we evaluate the effectiveness of the low-dimen-
ments of SVM classier. To achieve these requirements, sional features extracted by our approach for two-category
we focused on the extraction of the good data representa- classification. Figure 3 shows the experimental results. Our
tion and low-dimensional features from raw data, feeding STL-IDS performs better than single SVM. The accuracy,
it into SVM classifier for reducing the number of support pre-cision, recall, and f-measure values for single SVM are
vectors(nSVs) of SVM because non-linear kernels require 79.42%, 92.59%, 69.40%, and 79.42%, respectively. The
memory and computation that grow linearly proportional to cor-responding values for STL-IDS are 84.96%, 96.23%,
the SVs [38]. Generally, the storage requirements and com- 76.57%, and 85.28%, respectively. However, STL-IDS per-
putational complexity of SVM grows linearly proportional to forms better in all performance metrics compared with single
the number of SVs they have [39]. SVM. The experimental results also show that the proposed
Thus, because the storage requirements and computational approach STL-IDS reduces training and testing times of
complexity of SVM with RBF kernel depend on both input SVM, as shown in Table 3.
dimensionality (d) and the number of support vectors (nSV),
as discussed in Section III-C, our model has reduced the
storage requirements and computational complexity of SVM
compared to SVM alone because its capability to achieve low-
dimensional representation from raw data and less support
vector number of SVM need to be stored as shown in the
Table 3-Column nSV, Table 3-Column nSV, Table 4-Column
nSV, Table 5-Column nSV, Table 6-Column nSV.
TABLE 3. Training and testing time and number of support vectors (NSV)
comparison for STL-IDS and single SVM for binary classification based on
testing data.
TABLE 4. training and testing time and number of support vectors (NSV) 2) EVALUATION BASED ON TRAINING DATA
comparison for STL-IDS and single SVM for binary classification based on In this section, we use 10-fold cross validation to evaluate the
training data.
superiority of our proposed model through a comparison of its
performance metrics and training and testing times with those
of single SVM. Figure 4 shows that the performance of our
STL-IDS in binary classification is higher than that of single
SVM. The accuracy, precision, recall, and f-measure values
are 99.416%, 99.45%, 99.291%, and 99.373%, respectively,
whereas single SVM achieves 99.35%, 98.98%, 99.62%, and
TABLE 5. Training and testing time and number of support vectors (NSV) 99.30%, respectively. STL-IDS performs better than single
comparison for STL-IDS and single SVM for five-category classification
based on testing data.
SVM in all performance metrics, except for recall. The recall
values for STL-IDS and single SVM are 99.29% and 99.62%,
respectively. Moreover, TABLE 4 shows that our approach
can reduce the training and testing times (Table 4) of SVM,
which is crucial for measuring the efficiency of network
security applications.
TABLE 8. The tested values of hyper-parameters for our model for binary
classification.
TABLE 9. Additional performance comparisons with several related approaches in the binary classification.
FIGURE 10. Accuracy on KDDTEST+ dataset in the binary and multiclass FIGURE 11. The training time on KDDTRAIN+ dataset in the binary and
classification with different sparse parameer. multiclass classification with different sparse parameter.
TABLE 10. Additional performance comparisons with several related approaches in the multiclass classification.
Wang et al. [23] also claimed that their model achieved Javaid et al. [2] claimed that their deep learning approach
an accuracy of 99.31% for binary classification on the basis for network intrusion detection, which depends on STL
of 10-fold cross validation for KDDTrain+ (training data part that combines SAE with soft-max, obtains an accuracy
of the NSL-KDD dataset). of 88.39% and 79.10% for two-category and five-category
Table 9 demonstrates that our model achieves better accu- classification, respectively. Their approach was evaluated
racy compared with [23] model. However, their model per- on KDDTest+ (the testing data of NSL-KDD dataset).
forms better in terms of training time. Wang et al. [23] used Their model obtained an accuracy of less than 99%
logarithm marginal density ratio transformation (LMDRT) to for two-category and five-category classification based on
verify their model’s efficiency in enhancing the accuracy and KDDTrain+, which was evaluated using 10-fold cross-
reducing the training time of SVM. The weakness of their validation. The experimental results show that our method
method is that the reduction in testing time is not considered, outperforms the model in [2] by 1.38% in terms of detection
unlike in our proposed method. Yousefi-Azar et al. [10] accuracy rate when applied on KDDTrain+ and KDDTest+
evaluated their feature learning model based on deep AE, and separately for training and testing for five-category classi-
they applied their model with many shallow machine learning fication. For the KDDTrain+ dataset, our method achieves
algorithms, such as SVM. better results than [2]. Its accuracy for two-category and
However, the highest accuracy they achieved was 83.30%. five-category classification is 99.423% and 99.414%, respec-
As demonstrated in Table 9, our model is superior to their tively. This experimental evidence and the comparison of our
model. Moreover, their model is computationally expensive method and with that of [2] demonstrate that our method
because it contains many hidden layers and involves two achieves better results than [2] in terms of detection accu-
training stages. racy rate and time complexity because we used good feature
[24] D. Perez, M. A. Astor, D. P. Abreu, and E. Scalise, ‘‘Intrusion detection in MAJJED AL-QATF received the B.S. degree in
computer networks using hybrid machine learning techniques,’’ in Proc. network technology and computer security from
43rd Latin Amer. Comput. Conf. (CLEI), Sep. 2017, pp. 1–10. Sana’a University, Sana’a, Yemen, in 2013. He is
[25] A. A. Diro and N. Chilamkurti, ‘‘Distributed attack detection scheme using currently pursuing the M.S. degree in computer
deep learning approach for Internet of Things,’’ Future Gener. Comput. science with the School of Information Sci-
Syst., vol. 82, pp. 761–768, May 2018. ence and Engineering, Central South University,
[26] W. Wang et al., ‘‘HAST-IDS: Learning hierarchical spatial-temporal fea- Changsha, China. His research interests include
tures using deep neural networks to improve intrusion detection,’’ IEEE
deep learning, information security, and data
Access, vol. 6, pp. 1792–1806, 2018.
mining.
[27] F. Farahnakian and J. Heikkonen, ‘‘A deep auto-encoder based approach
for intrusion detection system,’’ in Proc. 20th Int. Conf. Adv. Commun.
Technol. (ICACT), Feb. 2018, p. 1.
[28] P. Madani and N. Vlajic, ‘‘Robustness of deep autoencoder in intrusion
detection under adversarial contamination,’’ in Proc. 5th Annu. Symp.
Bootcamp Hot Topics Sci. Secur., 2018, Art. no. 1.
YU LASHENG received the B.Sc. degree in com-
[29] R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng, ‘‘Self-taught learning:
Transfer learning from unlabeled data,’’ in Proc. 24th Int. Conf. Mach. puter science and the M.S. and Ph.D. degrees
Learn., 2007, pp. 759–766. in control theory and control engineering from
[30] A. Coates, A. Ng, and H. Lee, ‘‘An analysis of single-layer networks in Central South University, China. He is currently
unsupervised feature learning,’’ in Proc. 14th Int. Conf. Artif. Intell. Statist., a Vice Professor with Central South University.
2011, pp. 215–223. He has authored at least 70 papers on agent tech-
[31] A. Ng, ‘‘Sparse autoencoder. CS294A lecture notes,’’ Stanford Univ., nologies or algorithms and three books. He has
Stanford, CA, USA, Tech. Rep. 72, 2011. organized and implemented many projects that
[32] H. Liu, T. Taniguchi, T. Takano, Y. Tanaka, K. Takenaka, and T. Bando, have greatly benefitted our society. His main
‘‘Visualization of driving behavior using deep sparse autoencoder,’’ in research interests include smart computing, agent
Proc. IEEE Intell. Vehicles Symp., Jun. 2014, pp. 1427–1434. technologies and applications, structure and algorithm, and distributed com-
[33] J. Deng, Z. Zhang, E. Marchi, and B. Schuller, ‘‘Sparse autoencoder- puting. He is an ACM and CCF Member, and an ACM/ICPC Golden Medal
based feature transfer learning for speech emotion recognition,’’ in Coach. He is an Editor of the Journal of Convergence Information Technol-
Proc. Humaine Assoc. Conf. Affect. Comput. Intell. Interact., Sep. 2013, ogy and the Advances in Information Sciences and Service Sciences. He is
pp. 511–516.
also a reviewer for Future Generation Computer Systems, Journal of Parallel
[34] C.-F. Tsai, Y.-F. Hsu, C.-Y. Lin, and W.-Y. Lin, ‘‘Intrusion detection
and Distributed Computing, Artificial Intelligence Review, and some other
by machine learning: A review,’’ Expert Syst. Appl., vol. 36, no. 10,
pp. 11994–12000, 2009. journals and conferences.
[35] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, ‘‘A detailed analysis
of the KDD CUP 99 data set,’’ in Proc. IEEE Symp. Comput. Intell. Secur.
Defense Appl., Jul. 2009, pp. 1–6.
[36] N. Paulauskas and J. Auskalnis, ‘‘Analysis of data pre-processing influence MOHAMMED AL-HABIB received the B.S.
on intrusion detection using NSL-KDD dataset,’’ in Proc. Open Conf. degree in computer sciences and information sys-
Elect. Electron. Inf. Sci. (eStream), Apr. 2017, pp. 1–5. tems from Thamar University, Thamar, Yemen,
[37] C. L. C. Chang. Libsvm. Accessed: Jan. 5, 2018. [Online]. Available:
in 2011. He is currently pursuing the M.S. degree
http://www.csie.ntu.edu.tw/~cjlin/libsvm/
in computer science with the School of Informa-
[38] S. Maji, A. C. Berg, and J. Malik, ‘‘Classification using intersection kernel
support vector machines is efficient,’’ in Proc. IEEE Conf. Comput. Vis. tion Science and Engineering, Central South Uni-
Pattern Recognit., Jun. 2008, pp. 1–8. versity, Changsha, China. His research interests
[39] P. Ilayaraja, N. V. Neeba, and C. V. Jawahar, ‘‘Efficient implementation of include deep learning, computer vision, and data
SVM for large class problems,’’ in Proc. 19th Int. Conf. Pattern Recognit., mining.
Dec. 2008, pp. 1–4.
[40] W. L. Al-Yaseen, Z. A. Othman, and M. Z. A. Nazri, ‘‘Multi-level hybrid
support vector machine and extreme learning machine based on modi-
fied k-means for intrusion detection system,’’ Expert Syst. Appl., vol. 67,
pp. 296–303, Jan. 2017. KAMAL AL-SABAHI received the B.S. degree in
[41] Y. Li, R. Ma, and R. Jiao, ‘‘A hybrid malicious code detection method based
computer science from Sana’a University, Sana’a,
on deep learning,’’ Methods, vol. 9, no. 5, pp. 205–216, 2015.
Yemen, in 2008, and the M.S. degree in infor-
[42] N. Gao, L. Gao, Q. Gao, and H. Wang, ‘‘An intrusion detection model based
on deep belief networks,’’ in Proc. 2nd Int. Conf. Adv. Cloud Big Data, mation technology from OUM University, Kuala
Nov. 2014, pp. 247–252. Lumpur, Malaysia, in 2015. He is currently pursu-
[43] K. Alrawashdeh and C. Purdy, ‘‘Toward an online anomaly intrusion ing the Ph.D. degree in computer science with the
detection system based on deep learning,’’ in Proc. 15th IEEE Int. School of Information Science and Engineering,
Conf. Mach. Learn. Appl. (ICMLA), Anaheim, CA, USA, Feb. 2017, Central South University, Changsha, China. His
pp. 195–200. research interests include deep learning, natural
[44] R. C. Staudemeyer, ‘‘Applying long short-term memory recurrent neural language processing, knowledge engineering, and
networks to intrusion detection,’’ South Afr. Comput. J., vol. 56, no. 1, data mining.
pp. 136–154, 2015.