Network Intrusion Detection: Based On Deep Hierarchical Network and Original Flow Data
Network Intrusion Detection: Based On Deep Hierarchical Network and Original Flow Data
April 3, 2019.
Digital Object Identifier 10.1109/ACCESS.2019.2905041
ABSTRACT Network intrusion detection plays a very important role in protecting computer network
security. The abnormal traffic detection and analysis by extracting the statistical features of flow is the main
analysis method in the field of network intrusion detection. However, these features need to be designed
and extracted manually, which often loses the original information of the flow and leads to poor detection
efficiency. In this paper, we do not manually design the features of the flow but directly extract the raw data
information of the flow for analysis. In addition, we first proposed a new network intrusion detection model
named the deep hierarchical network, which integrates the improved LeNet-5 and LSTM neural network
structures, while learning the spatial and temporal features of flow. By designing a reasonable network
cascading method, we can train our proposed hierarchical network at the same time instead of training two
networks separately. In this paper, we use the CICIDS2017 dataset and the CTU dataset. The number and
types of flow in these two datasets are large, and the attack types are relatively new. The experimental results
show that the performance of the proposed hierarchical network model is significantly better than other
network intrusion detection models, which can achieve the best detection accuracy. Finally, we also present
an analysis method for traffic features which has an important contribution to abnormal traffic detection and
gives the actual meanings of these important features.
INDEX TERMS Network intrusion detection, deep hierarchical network, raw feature, feature importance.
2169-3536
2019 IEEE. Translations and content mining are permitted for academic research only.
37004 Personal use is also permitted, but republication/redistribution requires IEEE permission. VOLUME 7, 2019
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Y. Zhang et al.: Network Intrusion Detection: Based on Deep Hierarchical Network and Original Flow Data
The payload-based traffic detection method [5], [6] uses have similar shortcomings compared with the feature engi-
the information of the application layer protocol to express neering. The data does contain rich features with classifi-
the features of the traffic, the most representative of which cation recognition capabilities, but since the separate CNN
is the deep packet inspection (DPI) technology [7]. Deep and LSTM only utilize the spatial feature or temporal fea-
packet inspection technology needs to decrypt and encrypt ture of flow respectively, this is equivalent to discarding
the transmitted traffic data. By modeling and analyzing the some information. So if we want to further improve the
transmitted data information, malicious traffic packets can be classification accuracy and other metrics, we need to extract
detected very effectively. Although the deep packet inspec- the spatial and temporal features of the flow simultane-
tion technology is a widely used abnormal traffic detection ously using a hierarchical network. Code has been released
technology in practical applications, with the rise of encryp- at https://github.com/chenxu93/abnormal-traffic. The main
tion protocols (such as https) and the increasing emphasis contributions of this paper are as follows:
on privacy, deep packet inspection technology is no longer (1) We propose a new method for extracting flow features,
recommended. In addition, the use of deep packet inspection which preserves all the information of the flow as much as
in the decryption processing of traffic is very expensive. possible. The flow features we extracted do not require any
With the rapid growth of Internet traffic, deep packet inspec- prior knowledge, so we don’t need to manually extract the
tion technology needs to consume huge computing resources flow features with specific meanings.
when decrypting traffic packets. (2) For the first time, we propose a new deep hierarchical
The statistical feature based traffic detection method [8] network model structure to learn their temporal features and
generally uses the packet arrival time, the packet size and spatial features simultaneously from the original flow data.
the statistical features of the traffic packet fields (eg, average, Our model achieves the best performance on all metrics.
maximum, minimum) to express the attributes of the traffic. (3) We propose a method to analyze the flow features,
Using these artificially designed features and machine learn- which can find the features that contribute significantly to
ing algorithms to analyze and detect abnormal traffic [9] have abnormal flow detection and we give the true meanings of
become relatively reliable methods, but the traffic data needs these important features.
to be accurately labeled when training a supervised algorithm The structure of this paper is as follows. Section II
model. describes some of the related work of network intrusion
In the previous work, researchers mainly operated from detection. Section III details the abnormal flow classifica-
the data level to improve the classification accuracy and tion detection model we used in this paper. In section IV,
other metrics. Whether it is traditional machine learning algo- we describe the two datasets used in this paper and show
rithms or various neural network algorithms in deep learn- the experimental results we performed on the two datasets.
ing, researchers try to extract information from traffic data In section V, we analyze the flow features that have important
through complex feature engineering. Their feature engineer- contributions to abnormal flow detection. Finally, Section VI
ing can extract the temporal feature and spatial feature of the gives a conclusion of this article.
flow data, but feature engineering will lose some information
or change the original temporal and spatial features of the II. RELATED WORKS
traffic packets. Yeo et al. [10] extracted temporal features The concept of intrusion detection technology was first pro-
such as fiat, biat and duration, while Yu et al. [11] extracted posed by Anderson [12] in 1980, with the goal of identifying
temporal features such as activation time of flow, time inter- anomalous behaviors in the network. Reduce the losses of
val, packet arrival time and spatial features such as packet the network by taking appropriate measures against abnormal
number, IP address and transmission direction. Through the behaviors. Currently, many researchers perform normal or
traffic features they extracted, algorithms can only use the abnormal classification by extracting characters or numeric
missing traffic data information to perform classification, so features from traffic packets.
the classification accuracy and other metrics have reached the Fahad et al. [13] proposed a Global Optimization
bottleneck and can hardly continue to improve. Approach (GOA) and used feature selection methods to clas-
This paper uses the deep learning method in the field sify spatial and temporal domain traffic data to 97% accuracy.
of machine learning to classify flow. The neural network Bang et al. extracted the temporal and spatial features of
model in deep learning can automatically extract features traffic data from LTE signaling and used the semi-Markov
from the input data for training. It has good self-adaptation, model to detect attacks in wireless sensor networks. Their
self-organization and promotion ability to make the sys- method can effectively separate attack nodes and the false
tem have higher detection efficiency. The proposed method positive rate is very low [14]. Yang [15] proposed a new type
only uses the original information of traffic data as the fea- of abnormal network traffic detection algorithm in the cloud
tures of flow, and uses the hierarchical network structure computing environment. They proposed an Ent-SVM abnor-
to automatically learn the spatial and temporal features of mal traffic detection system framework mainly considering
flow without complex feature engineering. By analyzing the the source IP address number, source port number, desti-
experimental results, we find that the spatial and temporal nation IP address number, destination port number, packet
features extracted by the separate CNN and LSTM models type number and network packet number. By calculating
the mixed information entropy and the eigenvalues of the In this paper, we do not artificially design and extract
canonical network, the SVM algorithm is used for intrusion the characters or statistical features in the flow, but extract
detection. The proposed model can detect network abnormal the original hexadecimal codes in the flow, by mapping the
traffic with high precision on large-scale datasets. Ertam and original codes into equal-length decimal numbers as the fea-
Avcı [16] proposed a GA-WK-EML network traffic classi- tures of the flow. We designed an improved deep hierarchical
fication model. They use genetic algorithms to select the network model to classify flow, the CICIDS2017 dataset
best parameters based on the Wavelet function algorithm and CTU dataset were used in the experiments. These two
Extreme Learning Machine (WK-ELM). Through the adjust- new datasets contain a large number of traffic packets and
ment of parameters, the accuracy of traffic classification attack types. The experimental results show that the proposed
exceeds 95%. Nezhad et al. [17] extracted the number of model can still achieve 99.9% classification accuracy under
packets and the number of source IP addresses from the the condition of more types and numbers of traffic. In the
network traffic as the traffic detection indicator per minute experimental section, we compared the existing methods of
to detect DoS and DDoS attacks. They built a time series Wang et al. [23] in detail, and found that our model had fewer
of packet numbers and normalized them using the Box-Cox parameters and a very a low miss detection rate, and proved
transformation. The ARIMA model is proposed to predict the that our model can rapidly converge through experiments.
number of packets every other minute, and then the chaotic The differences between our proposed solution and existing
behaviors of the prediction error time series are detected methods such as BWManager [24] and LTE signaling attack
by calculating the maximum Lyapunov exponent. Through detection scheme [14] include: 1) We use deep learning meth-
simulation, it is found that the number of data packets and ods rather than traditional machine learning algorithms or
the number of source IP addresses increase sharply during the statistical learning methods. 2) Our method requires the use
attack time, and the classification accuracy rate for normal of original traffic data generated by network users for analysis
and attack traffic reaches 99.5%. Li et al. [18] proposed a and detection, rather than analyzing the resources of the
multi-layer anomaly traffic detection model, which extracts communication system for attack detection. 3) Our approach
the features of different network layers and uses PCA and can not only detect network attacks in specific networks such
random forest algorithms to remove redundant features. The as SDN, but also detect most common attacks on the Internet
detection accuracy and false positive rate of the model are and only require traffic data generated by these networks.
improved by obtaining high-quality features. Roy et al. [19] Therefore, our method can detect a large number of attack
designed a response feature from the KDD Cup99 dataset types, but more importantly, it can satisfy the attack detection
and classified the traffic using a deep neural network. The in the big data environment by using deep learning.
experimental results show that the deep neural network has
better classification accuracy than SVM. Zhou et al. [20] III. METHODOLOGY
extracted 256 features from the flow and mapped them into In this section, we designed an anomaly traffic detection
16*16 grayscale images, and then used the improved convo- model named deep hierarchical network. The deep hierar-
lutional neural network to classify flow. Their model has a chical network consists of two layers of the neural network
good classification result for data types with large data vol- algorithms model. The first layer is based on the improved
ume, but the classification of data types with small data vol- LetNet-5 convolutional neural network to extract the spatial
ume is very poor. Yuan et al. [21] proposed a recurrent neural features of the flow, and the second layer uses the LSTM
network model for deep learning. They extracted 20 fields network to extract the temporal features of the flow. The
from continuous flow packets sequence and generated a two networks are simultaneously trained by cascading into a
three-dimensional feature map using a sliding time window hybrid network to enable the network to automatically extract
of length T. The experiment found that the proposed model the spatial and temporal features of the flow. Before introduc-
reduced the error rate by about 5 percentage points compared ing the deep hierarchical network, we will first introduce the
to the traditional machine learning algorithm. Kim et al. [22] composition of the traffic data used by the training model.
used the LSTM network to perform five classifications in the
KDD dataset. Although the classification results are ideal, A. DATA PREPROCESSING
the KDD dataset is too old and there are only four types of In this paper, the original traffic packets are used as the net-
attacks. These types of attacks are no longer sufficient for work intrusion detection analysis. Compared with the com-
today’s network intrusion detection research. However, we monly used artificial traffic packets data extraction method,
found that the previously mentioned methods use different the method we proposed can retain all the feature information
flow features, and the datasets used have been released for a of each traffic packet. We do not need to filter or design the
long time without including some recent new attack types. traffic features that need to be extracted. In the Wireshark we
In addition, most researchers use a shallow classification can see that the original traffic packets are some hexadecimal
model, which can achieve better classification results when codes, as shown in FIGURE 1.
the feature dimension is small, but when the amount of data The process of extracting traffic features is as follows:
used is large and the feature dimension is large, the classifi- (1) data: Each traffic packet has an Ethernet layer, a net-
cation effect will be poor. work layer, a transport layer, and an application layer. In this
ci = σr (ω ∗ Xi + bi ) (1)
FIGURE 3. LSTM cell.
ht−1 and the input layer xt , and then outputs a value between fully express all the feature information of traffic, this paper
0 and 1 to the cell state Ct−1 through the activation function. can extract the spatial feature and temporal feature of flow
1 indicates complete reservation of information, and 0 indi- simultaneously by forming a hybrid network structure model
cates complete discard of information. by combining CNN and LSTM networks. The hybrid network
structure model is divided into two parts. Since the inputs to
ft = σ (Wf .[ht−1 , xt ] + bf ) (2)
the CNN and LSTM network structures have different forms,
(2)input gate: The input gate determines how much new we reshape the spatial features of the CNN network output at
information is stored in the cell state. The update of the cell the junction of the CNN and the LSTM network. Since each
state consists of two steps: first, the input gate layer (sigmoid flow extracts the first 10 traffic packets, each traffic packet
layer) determines the value to be updated by the cell, and then extracts only the first 160 bytes. To correspond to each traffic
the tanh layer creates a candidate value vector C to added it packet, we make the input size of the LSTM network 160 and
the cell state. the input time step to 10. The output of the deep hierarchical
network model is the probability of belonging to a certain
it = σ (Wi .[ht−1 , xt ] + bi ) (3) kind of flow, and its structure is shown in FIGURE 4.
C̃t = tanh(WC .[ht−1 , xt ] + bC ) (4) In this paper, the deep hierarchical network model struc-
Ct = ft ∗ C̃t − 1 + it−1 ∗ C̃t (5) ture classifier uses the softmax classifier, and the softmax
classifier outputs the class probability of each type of flow.
(3)output gate: In order to determine the final output value, The index with the highest probability is the classification
it is necessary to determine the state of the cell. First, use result of the hierarchical network on a flow. The loss func-
the sigmoid layer to determine which parts of the cell state to tion used in the model is the mean square loss function,
output. Then, the cell output is multiplied by the output of the and the training optimizer uses AdamOptimizer [32],which
previous sigmoid layer by a tanh layer operation as the final uses adaptive moment estimation for gradient descent. The
output value. The purpose of the tanh layer is to map cell state training and testing process of the deep hierarchical network
values between -1 and 1. structure model is shown in Algorithm 2.
ot = σ (Wo .[ht−1 , xt ] + bo ) (6)
IV. EXPERIMENTS
ht = ot ∗ tanh(Ct ) (7)
In this section, we performed three different experiments on
Since the arrival time of the traffic packets in each flow is the CICIDS2017 dataset and the CTU dataset respectively.
different and the values of the fields such as TTL are also In the first experiment, we used the CNN model to extract
different. Different from the methods of dealing with tempo- the spatial features of the flow to classify it. In the second
ral feature like Feghhi and Leith [30] and Shen et al. [31], this experiment, we used the LSTM model to extract the temporal
paper uses the LSTM network to perform automatic temporal features of the flow to classify it. In the third experiment,
feature extraction on the original flow data. In this paper, we extracted the spatial and temporal features of flow using
the LSTM network uses two layers of cells for temporal the proposed deep hierarchical network model to classify it.
feature extraction. Each cell of LSTM uses 256 hidden layer Our experimental environment is as follows:
units. The cell activation function of each layer uses the CPU: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
sigmoid function for nonlinear operation. The last layer of the GPU: GTX1080ti 11GB
LSTM network uses a fully connected layer, and the number RAM: 32GB
of neurons in the fully connected layer is equal to the number OS: Ubuntu 16.04
of classes of flow.
A. DATASET
D. DEEP HIERARCHICAL NETWORK As described by Weller-Fahy et al. [1], A key issue with
Ahuja [27] showed that network flows contain a large number most intrusion detection datasets is the lack of a sufficient
of features that can be analyzed. However, these features are number and types of traffic packets. This article uses two
based on statistics. These features, which are designed by different datasets to conduct experiments, both of which were
hand, cannot express the temporal and spatial characteristics recently released and these two dataset contain more traffic
of flows by using traditional algorithms. These artificially and types. Reliable validation and test dataset compared to
designed features transform the intrinsic features of flows other datasets.
from the very beginning, and also lose some of the features (1)CICIDS2017 Dataset
of flows, so the high-level semantics of flows cannot be The CICIDS2017 dataset is an intrusion detection and
fully represented. The CNN and LSTM networks, along with intrusion prevention dataset that was open sourced by Cana-
deeper depths and using the original flows data can learn dian Institute for Cybersecurity in 2017.
a high degree of semantic features and improve the perfor- Sharafaldin et al. [33] designed a real attack scenario to
mance of all metrics. collect traffic data by designing an attack network and a
Since CNN and LSTM network can only extract the spatial victim network. This dataset collects benign traffic and the
feature and temporal feature of flows separately and can not most common attack traffic from Monday to Friday and
gives real-world pcap files data. On Monday, no attack traffic by the network for a fixed period of time every Tuesday
collected only benign traffic and the attack network launched through Friday. Finally, the author accurately labeled the flow
an attack on the victim network to collected traffic generated according to the timestamp of the flow, the source IP, the
destination IP, the source port, the destination port, and the TABLE 2. CTU dataset.
protocol.
This paper extracts the benign flow and 10 types of attack
flow from the CICIDS2017 dataset as the training and test
data of the deep hierarchical network model, and extracts
the flow features by extracting the original flow data in
section 3. We labeled our generated flow according to the
CICIDS2017 data labeling method to get a real and reliable
label. Finally, we extract the number and type distribution of
flows as shown in TABLE 1.
are actually positive samples but the model is classified as TABLE 4. CTU dataset experiment results.
negative samples.
D. RESULTS
According to the original features we extracted from the flow,
we perform binary classification and multi-classification on
the CNN model, the LSTM model and the deep hierarchi-
cal network model respectively. The binary classification
experiment performs normal and abnormal classification on
flows, and the multi-classifications experiment performs a
class of normal and ten kinds of abnormal classification on
flows. The experimental results on the CICIDS2017 dataset
are shown in TABLE 3, and the experimental results on the
TABLE 5. Influence of input data size on classification accuracy.
CTU dataset are shown in TABLE 4. CNN2 indicates that
the binary classification is performed on the CNN algorithm,
CNN11, which indicates that the CNN algorithm performs
multi-classifications (1 benign plus 10 abnormal flows), and
CNN+LSTM2 indicates that the binary classification is per-
formed on our proposed deep hierarchical network. LSTM2,
LSTM11 and CNN+LSTM are the same.
From the experimental results on the CICIDS2017 dataset He Huang’s method only gives two metrics of precision and
in TABLE 3, we can find that the deep hierarchical network recall, and we give the four metrics of accuracy, precision,
model proposed by us has better performance than the tra- recall and F1-measure. The experimental results show that
ditional machine learning algorithm model by Sharafaldin our model is better on the two metrics of precision and recall.
et al. [33]. They manually extracted 80-dimensional features We retained more accurate values to compare performance
from each flow for learning. Our model has better experimen- between models more efficiently.
tal (improved the classification accuracy by about 3%) results The three different experiments of CNN model, LSTM
on the three metrics of precision, recall and F1-measure. model and deep hierarchical network model on two datasets
At the same time, the accuracy metric we have given shows show that CNN network model and LSTM network model
that our proposed deep hierarchical network model has a can extract spatial features and temporal features of flow
good detection efficiency for abnormal traffic. Although the separately. The separate CNN model and LSTM model can
proposed hierarchical network model has only a slight per- achieve good classification results in the binary-classification
formance improvement compared with the CNN or LSTM and multi-classification experiments. But comparing our pro-
model alone, in the actual network environment, because the posed deep hierarchical network model to extract the spatial
amount of traffic data is very large, it is better to detect the and temporal features of flow at the same time, our model
traffic packets with attack behaviors as many as possible. can further improve the performance of these classification
On the CTU dataset, the experimental results of the deep metrics. This shows that our proposed deep hierarchical net-
network model we proposed are shown in Table 4, com- work model can indeed learn the deeper abstract features of
pared with the experimental results of Huang et al. [34]. flow and perform better. The experimental results on both
datasets are very good, indicating that our model has good
generalization ability.
TABLE 3. CIC2017 dataset experiment results. In order to study the influences of input data size and type
on experimental results, we further studied the impacts of
individual header data and payload on classification accuracy.
Specifically, for each flow we extract the header and pay-
load of the first five traffic packets. For each traffic packet,
we extract the first 50 bytes of the header and payload respec-
tively. By padding, we extend a flow to a 256-dimensional
feature vector and then reshape the network to a size of 16*16.
We used the header and payload raw data to conduct multi-
classification experiments on the models we designed. The
experimental results are shown in TABLE 5.
Through experimental result in TABLE 5, we find that the
packet header information has more classification capability
than the payload information. In particular, when the payload
information is used alone, the model does not have the ability
TABLE 9. Machine learning algorithms convergence analysis. during the construction of the ensemble tree, then the more
important this feature is.
2) GAIN-BASED
The gain-based method is a classical feature importance anal-
ysis method proposed by Breiman [36] in 1984. Gain is the
contribution of loss or impurities to all the divisions of a
In order to compare with traditional machine learning feature. The gain calculation formula for feature A in a tree
algorithms, we trained five classifiers including KNN, Naive- is: g(D,A) = H(D) - H(D|A). Where H(D) is the information
Bayes(NB), Logistic Regression(LR), Random Forest(RF), entropy of feature set D in a given tree, and H(D|A) is the
Decision Tree(DT) on the CIC2017 dataset, and each classi- conditional entropy of feature set D given the condition of
fier was multi-classified using original flow data. We give the feature A. The larger the Gain, the more important the fea-
accuracy of these algorithms, the training time, and the test ture is.
time, the results are shown in Table 9. We find that Random
Forest can achieve the highest classification accuracy, but 3) COVER-BASED
this is still lower than the result of our proposed algorithm The cover-based method is the relative amount of specified
(0.998111). In terms of convergence, these five algorithms features observed in the tree. For example, suppose there are
are quite different. The test time of KNN is about 8 hours, 100 observations, there are 3 trees and 4 different features in
because the algorithm needs to calculate a large number of the ensemble tree, and assume that the node observations of
Euclidean distances. The test time of Random Forest and feature 1 in the three trees are 10, 5 and 2, then the value of
Decision Tree is low, because the depth parameter of the tree cover of feature 1 is 17. Similarly, the larger the cover value
is set relatively small to prevent overfitting. It indicates that of a feature, the more important the feature is.
only a small number of features are required to recognize the
abnormal traffic, and that these strongly separable features B. RESULTS
are derived from header fields (Table 5 shows that header is Three different feature importance analysis methods were
the main separable feature). In addition, because the payload used to analyze the original flow data extracted on the
of transmission in the dataset is very small, the feature matrix CICIDS2017 dataset. We performed binary classification and
is very sparse, which is also the reason why the test time of multi-classification experiments on 1600-dimensional fea-
Random Forest and Decision Tree algorithm is reduced. In the tures. The experimental results are shown in TABLE 10 and
actual network environment, an attacker usually does not send TABLE 11.
a small amount of payload. In this case, the feature matrix will
TABLE 10. Feature importance analysis: binary classification.
not become sparse and the test time will become longer.
TABLE 11. Feature importance analysis: multi-classification. contributed significantly to abnormal traffic detection and
found features that were rarely used by previous researchers.
In the future work, we will design a traffic collection sys-
tem by ourselves. Use our designed traffic acquisition system
to collect real-world traffic data under the environment of our
lab for analysis to detect suspicious attack traffic and evaluate
test results. In addition, we will improve our hierarchical
network model to make the network deeper, enabling the
model to detect unknown types of attacks that have not been
trained.
TABLE 12. The actual meanings of the fields.
REFERENCES
[1] D. J. Weller-Fahy, B. J. Borghetti, and A. A. Sodemann, ‘‘A survey of
distance and similarity measures used within network intrusion anomaly
detection,’’ IEEE Commun. Surveys Tuts., vol. 17, no. 1, pp. 70–91,
Jan. 2015.
[2] M. Ahmed, A. N. Mahmood, and J. Hu, ‘‘A survey of network anomaly
detection techniques,’’ J. Netw. Comput. Appl., vol. 60, pp. 19–31,
Jan. 2016.
[3] J. Zhang, C. Chen, Y. Xiang, W. Zhou, and Y. Xiang, ‘‘Internet traffic
classification by aggregating correlated naive Bayes predictions,’’ IEEE
Trans. Inf. Forensics Security, vol. 8, no. 1, pp. 5–15, Jan. 2013.
[4] A. Dainotti, F. Gargiulo, L. I. Kuncheva, A. Pescapè, and C. Sansone,
‘‘Identification of traffic flows hiding behind TCP Port 80,’’ in Proc. IEEE
Int. Conf. Commun., May 2010, pp. 1–6.
[5] S. Sen, O. Spatscheck, and D. Wang, ‘‘Accurate, scalable in-network
identification of p2p traffic using application signatures,’’ in Proc. 13th
Int. Conf. World Wide Web, pp. 512–521.
[6] K. Wang, G. Cretu, and S. J. Stolfo, ‘‘Anomalous payload-based worm
detection and signature generation,’’ in Proc. Int. Workshop Recent Adv.
Intrusion Detection, 2005, pp. 227–246.
[7] R. T. El-Maghraby, N. M. A. Elazim, and A. M. Bahaa-Eldin, ‘‘A survey
on deep packet inspection,’’ in Proc. 12th Int. Conf. Comput. Eng. Syst.
(ICCES), Dec. 2017, pp. 188–197.
[8] D. E. Denning, ‘‘An intrusion-detection model,’’ IEEE Trans. Softw. Eng.,
FIGURE 5. Important feature weighted score. vol. SE-13, no. 2, pp. 222–232, Feb. 1987.
[9] A. Vlăduţu, D. Comăneci, and C. Dobre, ‘‘Internet traffic classification
based on flows’ statistical properties with machine learning,’’ Int. J. Netw.
Manage., vol. 27, no. 3, p. e1929, May 2017.
the urgent pointer, window size and acknowledge number [10] M. Yeo et al., ‘‘Flow-based malware detection using convolutional neural
fields, we can conclude that malicious flows are usually sent network,’’ in Proc. Int. Conf. Inf. Netw. (ICOIN), Jan. 2018, pp. 910–913.
[11] Y. Yu, J. Long, and Z. Cai, ‘‘Session-based network intrusion detection
out in more slices. We found that several of the top 9 features using a deep learning architecture,’’ in Modeling Decisions for Artificial
are features that were previously rarely used by researchers Intelligence. V. Torra, Y. Narukawa, A. Honda, and S. Inoue, Eds. Cham,
in the field of network intrusion detection. Switzerland: Springer, 2017, pp. 144–155.
[12] J. P. Anderson, ‘‘Computer security threat monitoring and surveillance,’’
James P. Anderson Company, Washington, DC, USA, Tech. Rep. 19034,
VI. CONCLUSION 1980.
We consider the artificial design and extraction of the features [13] A. Fahad, Z. Tari, I. Khalil, A. Almalawi, and A. Y. Zomaya, ‘‘An optimal
and stable feature selection approach for traffic classification based on
of the flow for network intrusion detection will lose part of multi-criterion fusion,’’ Future Gener. Comput. Syst., vol. 36, pp. 156–169,
the traffic information and thus affect the detection accuracy. Jul. 2014.
In this paper, we extract the original information of flow and [14] J.-H. Bang, Y.-J. Cho, and K. Kang, ‘‘Anomaly detection of network-
initiated LTE signaling traffic in wireless sensor and actuator networks
use our proposed hierarchical network to detect abnormal based on a hidden semi-Markov model,’’ Comput. Secur., vol. 65,
flow. Our hierarchical network is a specially designed CNN pp. 108–120, Mar. 2017.
and LSTM model that learns spatial and temporal features [15] C. Yang, ‘‘Anomaly network traffic detection algorithm based on infor-
mation entropy measurement under the cloud computing environment,’’
from original flow information. To the best of our knowl- Cluster Comput., vol. 21, pp. 1–9, Jan. 2018. [Online]. Available:
edge, this is the first time that the original information of https://link.springer.com/journal/volumesAndIssues/10586
flow is used for feature learning. The hierarchical network [16] F. Ertam and E. Avcí, ‘‘A new approach for internet traffic classification:
GA-WK-ELM,’’ Measurement, vol. 95, pp. 135–142, Jan. 2017.
model we proposed is significantly better than other net-
[17] S. M. T. Nezhad, M. Nazari, and E. A. Gharavol, ‘‘A Novel DoS and
work intrusion detection models. In this paper, we use the DDoS attacks detection algorithm using ARIMA time series model and
CICIDS2017 dataset and CTU dataset. The experimental chaotic system in computer networks,’’ IEEE Commun. Lett., vol. 20, no. 4,
results on these two datasets show that our proposed model pp. 700–703, Apr. 2016.
[18] B. Li, S. Zhang, and K. Li, ‘‘Towards a multi-layers anomaly detection
can achieve very high accuracy, precision, recall and F1- framework for analyzing network traffic,’’ Concurrency Comput. Pract.
measure. At the same time, we analyzed the features that have Exper., vol. 29, no. 14, p. e3955, Jul. 2017.
[19] S. S. Roy, A. Mallik, R. Gulati, M. S. Obaidat, and P. V. Krishna, ‘‘A deep XU CHEN received the B.S. degree in commu-
learning based artificial neural network approach for intrusion detection,’’ nication engineering from the Chongqing Univer-
in Mathematics and Computing, D. Giri, R. N. Mohapatra, H. Begehr, and sity of Posts and Telecommunications, Chongqing,
M. S. Obaidat, Eds. Singapore: Springer, 2017, pp. 44–53. China, in 2017. He is currently pursuing the
[20] H. Zhou, Y. Wang, X. Lei, and Y. Liu, ‘‘A method of improved CNN B.S. degree with the Beijing University of Posts
traffic classification,’’ in Proc. 13th Int. Conf. Comput. Intell. Secur. (CIS), and Telecommunications, Beijing, China. His
Dec. 2017, pp. 177–181. research interests include deep learning and intru-
[21] X. Yuan, C. Li, and X. Li, ‘‘Deepdefense: Identifying DDoS attack via
sion detection.
deep learning,’’ in Proc. IEEE Int. Conf. Smart Comput. (SMARTCOMP),
May 2017, pp. 1–8.
[22] J. Kim, J. Kim, H. L. T. Thu, and H. Kim, ‘‘Long short term memory
recurrent neural network classifier for intrusion detection,’’ in Proc. Int.
Conf. Platform Technol. Service (PlatCon), Feb. 2016, pp. 1–5.
[23] W. Wang et al., ‘‘HAST-IDS: Learning hierarchical spatial-temporal fea-
tures using deep neural networks to improve intrusion detection,’’ IEEE
Access, vol. 6, pp. 1792–1806, 2018.
[24] T. Wang, Z. Guo, H. Chen, and W. Liu, ‘‘BWManager: Mitigating denial
of service attacks in software-defined networks through bandwidth predic-
tion,’’ IEEE Trans. Netw. Service Manage., vol. 15, no. 4, pp. 1235–1248, LEI JIN received the B.S. and M.S. degrees in
Dec. 2018. electrical and electronics engineering from the
[25] V. F. Taylor, R. Spolaor, M. Conti, and I. Martinovic, ‘‘Robust smartphone Beijing University of Posts and Telecommunica-
app identification via encrypted network traffic analysis,’’ IEEE Trans. Inf. tions, Beijing, China, in 2015 and 2017, respec-
Forensics Security, vol. 13, no. 1, pp. 63–78, Jan. 2018. tively, where he is currently pursuing the Ph.D.
[26] M. Matsugu, K. Mori, Y. Mitari, and Y. Kaneda, ‘‘Subject independent degree with the School of Electronic Engineer-
facial expression recognition with robust face detection using a convo-
ing. His current research interest includes artificial
lutional neural network,’’ Neural Netw., vol. 16, nos. 5–6, pp. 555–559,
intelligence.
Jul. 2003.
[27] R. K. Ahuja, Network Flows: Theory, Algorithms, and Applications.
London, U.K.: Pearson Education, 2017.
[28] Y. LeCun et al. (2015). Lenet-5, Convolutional Neural Networks. [Online].
Available: http://yann. lecun. com/exdb/lenet
[29] S. Hochreiter and J. Schmidhuber, ‘‘Long short-term memory,’’ Neural
Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
[30] S. Feghhi and D. J. Leith, ‘‘A Web traffic analysis attack using
only timing information,’’ IEEE Trans. Inf. Forensics Security,
vol. 11, no. 8, pp. 1747–1759, Aug. 2016. [Online]. Available:
http://dx.doi.org/10.1109/TIFS.2016.2551203 XIAOJUAN WANG received the Ph.D. degree
[31] M. Shen, M. Wei, L. Zhu, and M. Wang, ‘‘Classification of encrypted traf- in electrical and electronics engineering from the
fic with second-order Markov chains and application attribute bigrams,’’ Beijing University of Posts and Telecommunica-
IEEE Trans. Inf. Forensics Security, vol. 12, no. 8, pp. 1830–1843,
tions, Beijing, China, where she is currently an
Aug. 2017.
Associate Professor with the School of Electronic
[32] D. Kinga and J. B. Adam, ‘‘A method for stochastic optimization,’’ in Proc.
Int. Conf. Learn. Representations (ICLR), vol. 5, 2015, pp. 1–9. Engineering. Her current research interest includes
[33] I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, ‘‘Toward generating a artificial intelligence.
new intrusion detection dataset and intrusion traffic characterization,’’ in
Proc. ICISSP, Jan. 2018, pp. 108–116.
[34] H. Huang, H. Deng, J. Chen, L. Han, and W. Wang, ‘‘Automatic multi-
task learning system for abnormal network traffic detection,’’ Int. J. Emerg.
Technol. Learn., vol. 13, no. 4, pp. 4–20, Jan. 2018.
[35] T. Chen and C. Guestrin, ‘‘Xgboost: A scalable tree boosting system,’’
in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining,
Jun. 2016, pp. 785–794.
[36] L. Breiman, Classification Regression Trees. Evanston, IL, USA:
Routledge, 2017.
YONG ZHANG received the Ph.D. degree from DA GUO received the Ph.D. degree in electrical
the Beijing University of Posts and Telecommu- engineering from the Beijing University of Posts
nications, China, where he has been an Associate and Telecommunications, where he is currently
Professor with the School of Electronic Engineer- a Senior Engineer. His research interests include
ing. His research interests include self-organizing mobile communications, opportunistic networks,
networks, mobile communications, and cognitive WSN, and P2P networks.
networks.