0% found this document useful (0 votes)

55 views13 pages

Network Intrusion Detection: Based On Deep Hierarchical Network and Original Flow Data

This document proposes a new deep hierarchical network model for network intrusion detection. It extracts raw data features from network flows without manual feature engineering. This preserves more original information from the flows. The model integrates improved LeNet-5 and LSTM networks to learn both spatial and temporal features simultaneously. Experiments on two datasets show this hierarchical network performs better than other models, achieving the best detection accuracy. It also analyzes important flow features contributing to abnormal detection and their meanings.

Uploaded by

Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views13 pages

Network Intrusion Detection: Based On Deep Hierarchical Network and Original Flow Data

Uploaded by

Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Received February 23, 2019, accepted March 9, 2019, date of publication March 20, 2019, date of current version

April 3, 2019.
Digital Object Identifier 10.1109/ACCESS.2019.2905041

Network Intrusion Detection: Based on Deep

Hierarchical Network and Original Flow Data
YONG ZHANG , XU CHEN , LEI JIN, XIAOJUAN WANG, AND DA GUO
School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
Corresponding author: Xu Chen ([email protected])
This work was supported by the National Natural Science Foundation of China under Grant 61601053.

ABSTRACT Network intrusion detection plays a very important role in protecting computer network
security. The abnormal traffic detection and analysis by extracting the statistical features of flow is the main
analysis method in the field of network intrusion detection. However, these features need to be designed
and extracted manually, which often loses the original information of the flow and leads to poor detection
efficiency. In this paper, we do not manually design the features of the flow but directly extract the raw data
information of the flow for analysis. In addition, we first proposed a new network intrusion detection model
named the deep hierarchical network, which integrates the improved LeNet-5 and LSTM neural network
structures, while learning the spatial and temporal features of flow. By designing a reasonable network
cascading method, we can train our proposed hierarchical network at the same time instead of training two
networks separately. In this paper, we use the CICIDS2017 dataset and the CTU dataset. The number and
types of flow in these two datasets are large, and the attack types are relatively new. The experimental results
show that the performance of the proposed hierarchical network model is significantly better than other
network intrusion detection models, which can achieve the best detection accuracy. Finally, we also present
an analysis method for traffic features which has an important contribution to abnormal traffic detection and
gives the actual meanings of these important features.

INDEX TERMS Network intrusion detection, deep hierarchical network, raw feature, feature importance.

I. INTRODUCTION traffic, a feedback message is sent to the network to determine

With the continuous expansion and rapid development of whether the network needs to disconnect or give an alarm
the Internet, it has brought great convenience to network message.
users. But along with the development of the Internet, there In order to detect abnormal traffic efficiently, the traf-
have also been a series of attacks. A targeted attacker takes fic packets are usually divided into flow [3] according
appropriate attacks against a specific network to cause the to the source ip, destination ip, source port, destina-
network to crash, thereby failing to provide users with safe tion port, protocol, and timestamp. At present, there are
and reliable services, resulting in huge economic losses. The mature traffic detection technologies, including port-based
task of network intrusion detection is to discover suspicious method, payload-based method and statistical feature-based
attacks [1] and take corresponding measures to protect the method.
network from sustaining attacks and reduce economic losses. The port-based traffic detection method [4] is commonly
Traffic classification is an important task of network intru- used and effective in the early days. In the early days of the
sion detection [2]. It requires researchers to accurately judge Internet, network protocols used for network traffic were rel-
the collected traffic datasets and detect traffic with attack atively simple, and specific applications basically used fixed
behaviors. Traffic classification mainly analyzes some key port numbers. Therefore, when an application is attacked by
fields in the traffic packets to determine whether the network other applications, abnormal traffic packets can be effectively
is attacked or has abnormal behaviors that violate network detected based on the port number. However, with the advent
security. According to the classification test results of the of dynamic port allocation technology, ports can be easily
redirected. Therefore, the port-based traffic detection method
The associate editor coordinating the review of this manuscript and
cannot adequately express the traffic attributes of the net-
approving it for publication was Zehua Guo. work, and the traffic detection effect is often poor.

2169-3536
2019 IEEE. Translations and content mining are permitted for academic research only.
37004 Personal use is also permitted, but republication/redistribution requires IEEE permission. VOLUME 7, 2019
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Y. Zhang et al.: Network Intrusion Detection: Based on Deep Hierarchical Network and Original Flow Data

The payload-based traffic detection method [5], [6] uses have similar shortcomings compared with the feature engi-
the information of the application layer protocol to express neering. The data does contain rich features with classifi-
the features of the traffic, the most representative of which cation recognition capabilities, but since the separate CNN
is the deep packet inspection (DPI) technology [7]. Deep and LSTM only utilize the spatial feature or temporal fea-
packet inspection technology needs to decrypt and encrypt ture of flow respectively, this is equivalent to discarding
the transmitted traffic data. By modeling and analyzing the some information. So if we want to further improve the
transmitted data information, malicious traffic packets can be classification accuracy and other metrics, we need to extract
detected very effectively. Although the deep packet inspec- the spatial and temporal features of the flow simultane-
tion technology is a widely used abnormal traffic detection ously using a hierarchical network. Code has been released
technology in practical applications, with the rise of encryp- at https://github.com/chenxu93/abnormal-traffic. The main
tion protocols (such as https) and the increasing emphasis contributions of this paper are as follows:
on privacy, deep packet inspection technology is no longer (1) We propose a new method for extracting flow features,
recommended. In addition, the use of deep packet inspection which preserves all the information of the flow as much as
in the decryption processing of traffic is very expensive. possible. The flow features we extracted do not require any
With the rapid growth of Internet traffic, deep packet inspec- prior knowledge, so we don’t need to manually extract the
tion technology needs to consume huge computing resources flow features with specific meanings.
when decrypting traffic packets. (2) For the first time, we propose a new deep hierarchical
The statistical feature based traffic detection method [8] network model structure to learn their temporal features and
generally uses the packet arrival time, the packet size and spatial features simultaneously from the original flow data.
the statistical features of the traffic packet fields (eg, average, Our model achieves the best performance on all metrics.
maximum, minimum) to express the attributes of the traffic. (3) We propose a method to analyze the flow features,
Using these artificially designed features and machine learn- which can find the features that contribute significantly to
ing algorithms to analyze and detect abnormal traffic [9] have abnormal flow detection and we give the true meanings of
become relatively reliable methods, but the traffic data needs these important features.
to be accurately labeled when training a supervised algorithm The structure of this paper is as follows. Section II
model. describes some of the related work of network intrusion
In the previous work, researchers mainly operated from detection. Section III details the abnormal flow classifica-
the data level to improve the classification accuracy and tion detection model we used in this paper. In section IV,
other metrics. Whether it is traditional machine learning algo- we describe the two datasets used in this paper and show
rithms or various neural network algorithms in deep learn- the experimental results we performed on the two datasets.
ing, researchers try to extract information from traffic data In section V, we analyze the flow features that have important
through complex feature engineering. Their feature engineer- contributions to abnormal flow detection. Finally, Section VI
ing can extract the temporal feature and spatial feature of the gives a conclusion of this article.
flow data, but feature engineering will lose some information
or change the original temporal and spatial features of the II. RELATED WORKS
traffic packets. Yeo et al. [10] extracted temporal features The concept of intrusion detection technology was first pro-
such as fiat, biat and duration, while Yu et al. [11] extracted posed by Anderson [12] in 1980, with the goal of identifying
temporal features such as activation time of flow, time inter- anomalous behaviors in the network. Reduce the losses of
val, packet arrival time and spatial features such as packet the network by taking appropriate measures against abnormal
number, IP address and transmission direction. Through the behaviors. Currently, many researchers perform normal or
traffic features they extracted, algorithms can only use the abnormal classification by extracting characters or numeric
missing traffic data information to perform classification, so features from traffic packets.
the classification accuracy and other metrics have reached the Fahad et al. [13] proposed a Global Optimization
bottleneck and can hardly continue to improve. Approach (GOA) and used feature selection methods to clas-
This paper uses the deep learning method in the field sify spatial and temporal domain traffic data to 97% accuracy.
of machine learning to classify flow. The neural network Bang et al. extracted the temporal and spatial features of
model in deep learning can automatically extract features traffic data from LTE signaling and used the semi-Markov
from the input data for training. It has good self-adaptation, model to detect attacks in wireless sensor networks. Their
self-organization and promotion ability to make the sys- method can effectively separate attack nodes and the false
tem have higher detection efficiency. The proposed method positive rate is very low [14]. Yang [15] proposed a new type
only uses the original information of traffic data as the fea- of abnormal network traffic detection algorithm in the cloud
tures of flow, and uses the hierarchical network structure computing environment. They proposed an Ent-SVM abnor-
to automatically learn the spatial and temporal features of mal traffic detection system framework mainly considering
flow without complex feature engineering. By analyzing the the source IP address number, source port number, desti-
experimental results, we find that the spatial and temporal nation IP address number, destination port number, packet
features extracted by the separate CNN and LSTM models type number and network packet number. By calculating

VOLUME 7, 2019 37005

Y. Zhang et al.: Network Intrusion Detection: Based on Deep Hierarchical Network and Original Flow Data

the mixed information entropy and the eigenvalues of the In this paper, we do not artificially design and extract
canonical network, the SVM algorithm is used for intrusion the characters or statistical features in the flow, but extract
detection. The proposed model can detect network abnormal the original hexadecimal codes in the flow, by mapping the
traffic with high precision on large-scale datasets. Ertam and original codes into equal-length decimal numbers as the fea-
Avcı [16] proposed a GA-WK-EML network traffic classi- tures of the flow. We designed an improved deep hierarchical
fication model. They use genetic algorithms to select the network model to classify flow, the CICIDS2017 dataset
best parameters based on the Wavelet function algorithm and CTU dataset were used in the experiments. These two
Extreme Learning Machine (WK-ELM). Through the adjust- new datasets contain a large number of traffic packets and
ment of parameters, the accuracy of traffic classification attack types. The experimental results show that the proposed
exceeds 95%. Nezhad et al. [17] extracted the number of model can still achieve 99.9% classification accuracy under
packets and the number of source IP addresses from the the condition of more types and numbers of traffic. In the
network traffic as the traffic detection indicator per minute experimental section, we compared the existing methods of
to detect DoS and DDoS attacks. They built a time series Wang et al. [23] in detail, and found that our model had fewer
of packet numbers and normalized them using the Box-Cox parameters and a very a low miss detection rate, and proved
transformation. The ARIMA model is proposed to predict the that our model can rapidly converge through experiments.
number of packets every other minute, and then the chaotic The differences between our proposed solution and existing
behaviors of the prediction error time series are detected methods such as BWManager [24] and LTE signaling attack
by calculating the maximum Lyapunov exponent. Through detection scheme [14] include: 1) We use deep learning meth-
simulation, it is found that the number of data packets and ods rather than traditional machine learning algorithms or
the number of source IP addresses increase sharply during the statistical learning methods. 2) Our method requires the use
attack time, and the classification accuracy rate for normal of original traffic data generated by network users for analysis
and attack traffic reaches 99.5%. Li et al. [18] proposed a and detection, rather than analyzing the resources of the
multi-layer anomaly traffic detection model, which extracts communication system for attack detection. 3) Our approach
the features of different network layers and uses PCA and can not only detect network attacks in specific networks such
random forest algorithms to remove redundant features. The as SDN, but also detect most common attacks on the Internet
detection accuracy and false positive rate of the model are and only require traffic data generated by these networks.
improved by obtaining high-quality features. Roy et al. [19] Therefore, our method can detect a large number of attack
designed a response feature from the KDD Cup99 dataset types, but more importantly, it can satisfy the attack detection
and classified the traffic using a deep neural network. The in the big data environment by using deep learning.
experimental results show that the deep neural network has
better classification accuracy than SVM. Zhou et al. [20] III. METHODOLOGY
extracted 256 features from the flow and mapped them into In this section, we designed an anomaly traffic detection
16*16 grayscale images, and then used the improved convo- model named deep hierarchical network. The deep hierar-
lutional neural network to classify flow. Their model has a chical network consists of two layers of the neural network
good classification result for data types with large data vol- algorithms model. The first layer is based on the improved
ume, but the classification of data types with small data vol- LetNet-5 convolutional neural network to extract the spatial
ume is very poor. Yuan et al. [21] proposed a recurrent neural features of the flow, and the second layer uses the LSTM
network model for deep learning. They extracted 20 fields network to extract the temporal features of the flow. The
from continuous flow packets sequence and generated a two networks are simultaneously trained by cascading into a
three-dimensional feature map using a sliding time window hybrid network to enable the network to automatically extract
of length T. The experiment found that the proposed model the spatial and temporal features of the flow. Before introduc-
reduced the error rate by about 5 percentage points compared ing the deep hierarchical network, we will first introduce the
to the traditional machine learning algorithm. Kim et al. [22] composition of the traffic data used by the training model.
used the LSTM network to perform five classifications in the
KDD dataset. Although the classification results are ideal, A. DATA PREPROCESSING
the KDD dataset is too old and there are only four types of In this paper, the original traffic packets are used as the net-
attacks. These types of attacks are no longer sufficient for work intrusion detection analysis. Compared with the com-
today’s network intrusion detection research. However, we monly used artificial traffic packets data extraction method,
found that the previously mentioned methods use different the method we proposed can retain all the feature information
flow features, and the datasets used have been released for a of each traffic packet. We do not need to filter or design the
long time without including some recent new attack types. traffic features that need to be extracted. In the Wireshark we
In addition, most researchers use a shallow classification can see that the original traffic packets are some hexadecimal
model, which can achieve better classification results when codes, as shown in FIGURE 1.
the feature dimension is small, but when the amount of data The process of extracting traffic features is as follows:
used is large and the feature dimension is large, the classifi- (1) data: Each traffic packet has an Ethernet layer, a net-
cation effect will be poor. work layer, a transport layer, and an application layer. In this

37006 VOLUME 7, 2019

Y. Zhang et al.: Network Intrusion Detection: Based on Deep Hierarchical Network and Original Flow Data

Algorithm 1 Original Flow Data Extraction

Input: network traffic pcap files.
Output: original flow data and its labels.
Step 1: transform pcap files to txt files.
for each pcap do
Get flows based on the five-tuple information of traffic packages.
for each flow do
Transform flow pcap file into txt file with tshark to get flow’s original hexadecimal data
end
end
Step 2: extract the original flow features from the txt files
create a null list as all of the flows feature vectors,flows=[].
for each txt file do
create a null list as all of the packages feature vector in the flow,flow_feature=[].
for each package do
create a null list as each package feature vector in the flow,pkt_feature=[].
get the 16th to 175th hexadecimal bytes of the package to generate the package feature vector.
if the number of bytes in the package less than 176 do
fill with 0 to the 175th byte
if the number of package less than 10 in the flow do
fill with 0 to the 10th package
updatae each package feature vector, pkt_feature.
add pkt_feature vector into the flow_feature vector
end
label the flow base on the five-tuple information in the last dimension of the flow_feature vector
add flow_feature vector into the flow vector
end

(3) vectorization: Statistics show that the number of traffic

packets in most flows is less than 10, but the number of traffic
packets in some flows is greater than 10 or even exceed 100.
Since the payload length of each traffic packet is not equal,
in order to use our raw data to train our classification model,
we only extract 160 bytes in each traffic packet as the traffic
packet features. Therefore, if the packet length of a packet
is less than 160 bytes, then we need to use 0 padding for
this packet. If a packet is longer than 160 bytes, we only
take the first 160 bytes. In order to make the data sent to
the model has the same dimensions, we only use the first
FIGURE 1. Raw traffic packet data. 10 traffic packets of each flow. So, for each flow we extracted
1600-dimensional raw data. Original flow data extraction
method is shown in Algorithm 1.
paper, we do not use the data of the Ethernet layer and the
network layer’s Version and Differentiated Services fields. B. CNN MODEL
Because in the Ethernet layer, the three fields are the MAC CNN’s convolution operation has good spatial sensing abil-
source address, the MAC destination address, and the proto- ity, and it is widely used in image processing such as face
col version. According to Anderson et al.’s [12] analysis of recognition [26] and has achieved good results. In the net-
the features of the flow, these fields are usually not used as work, the traffic packets generated by users are fragmented
the features of the traffic packets. The first line in FIGURE 1 during the transmission process [27], and the IP field of each
is the raw data of a traffic packet we discarded. traffic packet indicates the spatial features of the flows. Con-
(2) split: We use the SplitCap tool to split traffic packets sidering the spatial features of traffic data, the first layer of
with the same five-tupple information into a flow [25]. In the our deep hierarchical network uses the CNN model to extract
obtained flows, we found that the number of traffic packets spatial features of traffic packets.
contained in different flows is not the same within a certain This paper uses the improved LeNet-5 network struc-
timestamp. So we don’t use all the traffic packets in a flow. ture [28], which is a classic handwritten digit recognition

VOLUME 7, 2019 37007

Y. Zhang et al.: Network Intrusion Detection: Based on Deep Hierarchical Network and Original Flow Data

FIGURE 2. CNN network structure model.

CNN network. In this paper, the 1600-dimensional features C. LSTM MODEL

are first converted into a 40*40 grayscale image as the input to The Recurrent Neural Network (RNN) in deep learning is
the CNN network input layer. The hidden layer of CNN uses widely used in speech processing, and has achieved good
two convolution layers and two maximum pooling layers to results in speech recognition and time series processing. In
perform spatial feature extraction on the original flow data. the traffic data, the transmission of the traffic packets has
Among them, the first convolution layer uses 32 5*5 con- a chronological order, and the arrival of the traffic packets
volution kernels, and then performs the maximum pooling also has a sequence in the receiving end due to the delay
operation. The second convolution layer uses 64 3*3 convolu- problem. At the same time, the number of traffic packets sent
tion kernels and then performs maximum pooling operations. within a certain timestamp varies, and these traffic packets
After convolution operations, the CNN hidden layer first uses characteristics indicate that they have temporal features. This
the ReLU activation function to transform and then uses the paper uses the LSTM [29] network structure, and the LSTM
maximum pooling operation. The original 40*40 grayscale network structure is a variant of RNN. The cell processor
image becomes 8*8 with 64 channels of an image. structure in the LSTM algorithm determines whether or not to
After performing a flatten operation on an 8*8*64 image, add a useful message. Since the cell contains data operations
a 4096-dimensional vector is obtained and then sent to the for the input gate, the forget gate, and the output gate control
output layer of the CNN. CNN’s output layer uses a fully con- network, this has a good effect in dealing with the dependency
nected layer, and the fully connected layer uses 1600 neurons. problem of a long sequence. The cell structure is shown in
The purpose is to maintain the same dimensional data feature FIGURE 3.
as the original traffic data after spatial feature extraction. In
addition, in order to prevent over-fitting, a dropout operation
is performed after the fully connected layer to randomly
inactivate some of the neurons of the fully connected layer.
The CNN network structure used in this paper is shown in
FIGURE 2.
The convolution operation uses an f*f-sized convolution
kernel ω to perform a sliding convolution on a size n*n
picture, and each sliding convolution produces a new feature.
Suppose X is the input of the convolution, b is the bias term,
ci is the new feature produced by the convolution at the i-th
layer, and σr is the activation function ReLU. Then the new
features obtained by the convolution operation are:

ci = σr (ω ∗ Xi + bi ) (1)
FIGURE 3. LSTM cell.

After the convolution operation, the feature map of n ∗ nwill

generate a feature map of c = (n−f + 1) ∗ (n−f + 1) size The calculation operation of each neuron node in LSTM is
through a convolution kernel sliding window of size f ∗ f . as follows:
Maximum pooling is carried out for feature map c after (1)forget gate: The first step in the LSTM network is to
convolution, and the maximum value in the selected window determine the information to be discarded from the cell,
is taken as the final feature. The final feature map size is: which is done through the forget gate layer. The forget gate
[(n−f + 1) ∗ (n−f + 1)] /2. first reads the data information of the previous hidden layer

37008 VOLUME 7, 2019

Y. Zhang et al.: Network Intrusion Detection: Based on Deep Hierarchical Network and Original Flow Data

ht−1 and the input layer xt , and then outputs a value between fully express all the feature information of traffic, this paper
0 and 1 to the cell state Ct−1 through the activation function. can extract the spatial feature and temporal feature of flow
1 indicates complete reservation of information, and 0 indi- simultaneously by forming a hybrid network structure model
cates complete discard of information. by combining CNN and LSTM networks. The hybrid network
structure model is divided into two parts. Since the inputs to
ft = σ (Wf .[ht−1 , xt ] + bf ) (2)
the CNN and LSTM network structures have different forms,
(2)input gate: The input gate determines how much new we reshape the spatial features of the CNN network output at
information is stored in the cell state. The update of the cell the junction of the CNN and the LSTM network. Since each
state consists of two steps: first, the input gate layer (sigmoid flow extracts the first 10 traffic packets, each traffic packet
layer) determines the value to be updated by the cell, and then extracts only the first 160 bytes. To correspond to each traffic
the tanh layer creates a candidate value vector C to added it packet, we make the input size of the LSTM network 160 and
the cell state. the input time step to 10. The output of the deep hierarchical
network model is the probability of belonging to a certain
it = σ (Wi .[ht−1 , xt ] + bi ) (3) kind of flow, and its structure is shown in FIGURE 4.
C̃t = tanh(WC .[ht−1 , xt ] + bC ) (4) In this paper, the deep hierarchical network model struc-
Ct = ft ∗ C̃t − 1 + it−1 ∗ C̃t (5) ture classifier uses the softmax classifier, and the softmax
classifier outputs the class probability of each type of flow.
(3)output gate: In order to determine the final output value, The index with the highest probability is the classification
it is necessary to determine the state of the cell. First, use result of the hierarchical network on a flow. The loss func-
the sigmoid layer to determine which parts of the cell state to tion used in the model is the mean square loss function,
output. Then, the cell output is multiplied by the output of the and the training optimizer uses AdamOptimizer [32],which
previous sigmoid layer by a tanh layer operation as the final uses adaptive moment estimation for gradient descent. The
output value. The purpose of the tanh layer is to map cell state training and testing process of the deep hierarchical network
values between -1 and 1. structure model is shown in Algorithm 2.
ot = σ (Wo .[ht−1 , xt ] + bo ) (6)
IV. EXPERIMENTS
ht = ot ∗ tanh(Ct ) (7)
In this section, we performed three different experiments on
Since the arrival time of the traffic packets in each flow is the CICIDS2017 dataset and the CTU dataset respectively.
different and the values of the fields such as TTL are also In the first experiment, we used the CNN model to extract
different. Different from the methods of dealing with tempo- the spatial features of the flow to classify it. In the second
ral feature like Feghhi and Leith [30] and Shen et al. [31], this experiment, we used the LSTM model to extract the temporal
paper uses the LSTM network to perform automatic temporal features of the flow to classify it. In the third experiment,
feature extraction on the original flow data. In this paper, we extracted the spatial and temporal features of flow using
the LSTM network uses two layers of cells for temporal the proposed deep hierarchical network model to classify it.
feature extraction. Each cell of LSTM uses 256 hidden layer Our experimental environment is as follows:
units. The cell activation function of each layer uses the CPU: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
sigmoid function for nonlinear operation. The last layer of the GPU: GTX1080ti 11GB
LSTM network uses a fully connected layer, and the number RAM: 32GB
of neurons in the fully connected layer is equal to the number OS: Ubuntu 16.04
of classes of flow.
A. DATASET
D. DEEP HIERARCHICAL NETWORK As described by Weller-Fahy et al. [1], A key issue with
Ahuja [27] showed that network flows contain a large number most intrusion detection datasets is the lack of a sufficient
of features that can be analyzed. However, these features are number and types of traffic packets. This article uses two
based on statistics. These features, which are designed by different datasets to conduct experiments, both of which were
hand, cannot express the temporal and spatial characteristics recently released and these two dataset contain more traffic
of flows by using traditional algorithms. These artificially and types. Reliable validation and test dataset compared to
designed features transform the intrinsic features of flows other datasets.
from the very beginning, and also lose some of the features (1)CICIDS2017 Dataset
of flows, so the high-level semantics of flows cannot be The CICIDS2017 dataset is an intrusion detection and
fully represented. The CNN and LSTM networks, along with intrusion prevention dataset that was open sourced by Cana-
deeper depths and using the original flows data can learn dian Institute for Cybersecurity in 2017.
a high degree of semantic features and improve the perfor- Sharafaldin et al. [33] designed a real attack scenario to
mance of all metrics. collect traffic data by designing an attack network and a
Since CNN and LSTM network can only extract the spatial victim network. This dataset collects benign traffic and the
feature and temporal feature of flows separately and can not most common attack traffic from Monday to Friday and

VOLUME 7, 2019 37009

Y. Zhang et al.: Network Intrusion Detection: Based on Deep Hierarchical Network and Original Flow Data

FIGURE 4. Deep hierarchical network structure model.

Algorithm 2 Training and Testing Process of the Deep Hierarchical Network

Input: Network flow images, each flow image include 10 packages(p1,p2. . . p10).
Output: flow category probabilities list [c1,c2. . . cn].
Step 1: CNN model learn spatial features
1. Reshape the 1600-dimensional flow feature into a 40*40 grayscale image.
2. Add the first layer of convolution operation(filter size:5*5*32)followed by the first max pooling layer of
size 2*2.
3. Add the second layer of convolution operation (filter sizeÅĄÅ§3*3*64) followed by the second max pooling layer
of size 2*2.
4. Add a full connection layer with 1600 neurons and then perform a dropout operation to obtain 1600-dimension
temporal features.
Step 2: LSTM model learn temporal features
1. Reshape the temporal features into a 10*160 feature map.
2. Add the first lstm cell with 256 neurons.
3. Add the second lstm cell with 256 neurons.
4. Add a dense layer whose output are spatial&temporal features.
5. Add a softmax layer to output the probability of each class of flow.
Step 3: Train hybrid model
1. Add a mean square error loss function.
2. While train iteration do not complete,do
a. Get a mini-batch train dataset as the model input.
b. Compute the mean square erroe loss function j=1,n is the number of flow classes.
c. Update the weights and biases using the Aamoptimizer gradient descent optimization algorithm.
end
Step 4: Test model
1. Test the trained model using the test dataset
2. Return the class of each test flow.

gives real-world pcap files data. On Monday, no attack traffic by the network for a fixed period of time every Tuesday
collected only benign traffic and the attack network launched through Friday. Finally, the author accurately labeled the flow
an attack on the victim network to collected traffic generated according to the timestamp of the flow, the source IP, the

37010 VOLUME 7, 2019

Y. Zhang et al.: Network Intrusion Detection: Based on Deep Hierarchical Network and Original Flow Data

destination IP, the source port, the destination port, and the TABLE 2. CTU dataset.
protocol.
This paper extracts the benign flow and 10 types of attack
flow from the CICIDS2017 dataset as the training and test
data of the deep hierarchical network model, and extracts
the flow features by extracting the original flow data in
section 3. We labeled our generated flow according to the
CICIDS2017 data labeling method to get a real and reliable
label. Finally, we extract the number and type distribution of
flows as shown in TABLE 1.

TABLE 1. CICIDS2017 dataset.

40*40 grayscale image, the first layer convolution operation

is 32 5*5 kernels, and the second layer convolution opera-
tion is 64 3*3 kernels. The final 40*40 grayscale image is
downsampled as 8*8*64, and a 1600-dimensional feature is
output through a fully connected layer. Because the LSTM
network inputs a time signal once within a timestep, we divide
the 1600-dimensional feature of the CNN network output
into a matrix size of 10*160. The reason for dividing into
10 inputs is because the 1600-dimensional feature represents
10 consecutive traffic packets in a flow, thus preserving the
temporal features of the flow. The LSTM network consists of
two layers, each layer with 256 neurons. We trained 1 epoch
According to the number of flows we extracted in on the training set, and the mini-batchsize was 128. In the test
TABLE 1, it can be found that the percentage of benign flow phase we used mini-batchsize 2000.
and port scan attack flow are far greater than other types of The training method uses joint end-to-end training to train
the attack flow. In the multi-classification training, in order to both CNN and LSTM networks. The forward process trains
deal with the deviation caused by data imbalance, we perform the CNN network and then trains the LSTM network. The
random downsampling on benign flows and port scan flows. backward process first calculates the loss of the LSTM and
In the binary classification, we use all the benign flows and then calculates the loss of the CNN network. The joint end-to-
attack flows. end training method can ensure that the hierarchical network
(2)CTU Dataset can simultaneously learn the temporal features and spatial
The CTU dataset is the BotNet traffic data collected by features of flows and improve the classification accuracy and
CTU University. This dataset contains a large amount of other metrics in the test phase.
BotNet traffic and is mixed with normal traffic and back-
ground traffic. This dataset takes into account different types C. EVALUATION METRICS
of BotNet traffic in different scenarios. The traffic in the Our evaluation of model performance is based on the follow-
PCAP files format is captured in each scenario and the traffics ing metrics:
are labeled. This paper selects 11 types of traffic generated
TP + TN
between April 2017 and April 2018. It includes 1 type of Accuracy = (8)
benign traffic and 10 types of BotNet traffic. The specific TP + FP + TN + FN
quantities of various types of traffic are shown in TABLE 2. TP
Pr ecision = (9)
Since the percentage of benign flow and attack flow in the TP + FP
CTU dataset is not very different, we do not need to adopt data TP
Recall = (10)
balance processing when performing multi-classification and TP + FN
binary classification. Although the Percentage of the total 2 ∗ Precision ∗ Recall
F1 − Measure = (11)
number of BotNet traffic between Viaxmr and Trojan is Pr ecision + Recall
relatively small, considering that these two type flows are Here, TP is the number of positive samples in the test dataset
the more common attack flow, we still use them for intrusion and the model classification is also classified as positive
detection analysis to find suspicious attack behaviors. samples. FP is the number of samples that are actually nega-
tive samples in the test dataset but are classified as positive
B. IMPLEMENTATION DETAILS samples. TN is the number of negative samples actually
We train the hierarchical network in a joint end-to-end measured in the test dataset and the model is also classified
training method. The input to the CNN network is a as negative samples. FN is the number of test samples that

VOLUME 7, 2019 37011

Y. Zhang et al.: Network Intrusion Detection: Based on Deep Hierarchical Network and Original Flow Data

are actually positive samples but the model is classified as TABLE 4. CTU dataset experiment results.
negative samples.

D. RESULTS
According to the original features we extracted from the flow,
we perform binary classification and multi-classification on
the CNN model, the LSTM model and the deep hierarchi-
cal network model respectively. The binary classification
experiment performs normal and abnormal classification on
flows, and the multi-classifications experiment performs a
class of normal and ten kinds of abnormal classification on
flows. The experimental results on the CICIDS2017 dataset
are shown in TABLE 3, and the experimental results on the
TABLE 5. Influence of input data size on classification accuracy.
CTU dataset are shown in TABLE 4. CNN2 indicates that
the binary classification is performed on the CNN algorithm,
CNN11, which indicates that the CNN algorithm performs
multi-classifications (1 benign plus 10 abnormal flows), and
CNN+LSTM2 indicates that the binary classification is per-
formed on our proposed deep hierarchical network. LSTM2,
LSTM11 and CNN+LSTM are the same.
From the experimental results on the CICIDS2017 dataset He Huang’s method only gives two metrics of precision and
in TABLE 3, we can find that the deep hierarchical network recall, and we give the four metrics of accuracy, precision,
model proposed by us has better performance than the tra- recall and F1-measure. The experimental results show that
ditional machine learning algorithm model by Sharafaldin our model is better on the two metrics of precision and recall.
et al. [33]. They manually extracted 80-dimensional features We retained more accurate values to compare performance
from each flow for learning. Our model has better experimen- between models more efficiently.
tal (improved the classification accuracy by about 3%) results The three different experiments of CNN model, LSTM
on the three metrics of precision, recall and F1-measure. model and deep hierarchical network model on two datasets
At the same time, the accuracy metric we have given shows show that CNN network model and LSTM network model
that our proposed deep hierarchical network model has a can extract spatial features and temporal features of flow
good detection efficiency for abnormal traffic. Although the separately. The separate CNN model and LSTM model can
proposed hierarchical network model has only a slight per- achieve good classification results in the binary-classification
formance improvement compared with the CNN or LSTM and multi-classification experiments. But comparing our pro-
model alone, in the actual network environment, because the posed deep hierarchical network model to extract the spatial
amount of traffic data is very large, it is better to detect the and temporal features of flow at the same time, our model
traffic packets with attack behaviors as many as possible. can further improve the performance of these classification
On the CTU dataset, the experimental results of the deep metrics. This shows that our proposed deep hierarchical net-
network model we proposed are shown in Table 4, com- work model can indeed learn the deeper abstract features of
pared with the experimental results of Huang et al. [34]. flow and perform better. The experimental results on both
datasets are very good, indicating that our model has good
generalization ability.
TABLE 3. CIC2017 dataset experiment results. In order to study the influences of input data size and type
on experimental results, we further studied the impacts of
individual header data and payload on classification accuracy.
Specifically, for each flow we extract the header and pay-
load of the first five traffic packets. For each traffic packet,
we extract the first 50 bytes of the header and payload respec-
tively. By padding, we extend a flow to a 256-dimensional
feature vector and then reshape the network to a size of 16*16.
We used the header and payload raw data to conduct multi-
classification experiments on the models we designed. The
experimental results are shown in TABLE 5.
Through experimental result in TABLE 5, we find that the
packet header information has more classification capability
than the payload information. In particular, when the payload
information is used alone, the model does not have the ability

37012 VOLUME 7, 2019

Y. Zhang et al.: Network Intrusion Detection: Based on Deep Hierarchical Network and Original Flow Data

TABLE 6. CICIDS2017 dataset semi-supervised model experimental results.

TABLE 7. CTU dataset semi-supervised model experimental results.

to recognize when performing multi-classification. This is TABLE 8. Model convergence analysis.

because in most cases the differences between payloads trans-
mitted by the same host are not obvious and the payloads of
the transmission are few, resulting in a very sparse feature
matrix. Compared with the proposed method, by extract-
ing the first 160 bytes of each traffic packet that include
the packet header information and the payload information, traffic packet of length n and an OHE vector length of m,
the classification accuracy can be further improved under the then their method introduces n (m − 1) 0 elements. These
same network structure. The gain obtained by the combined large 0 elements account for (1 − 1/m) percent of the total
packet header and payload is mainly due to the addition traffic packet bytes, which not only introduces additional
of field information of the application layer, which further computational parameters but also makes network learning
enhances the expression features of the traffic to make the useless. What’s more, their hierarchical network structure is
traffic data more distinguishable. In fact, by analyzing the very different from ours. In their network structure, CNN
payload part, we found that the obtained feature vectors are extracts features by convolution operation just for each traffic
very sparse and have too many 0 elements, which make our packet. Firstly, spatial feature extraction is carried out for
network models unable to distinguish the categories well. r traffic packets in a flow, then feature vectors of r traffic
In addition, we used the statistical features of packets are cached, and finally each feature vector is sent
Vlăduţu et al. [9] and the semi-supervised machine learn- to the LSTM network for temporal feature learning. In this
ing algorithm model of the Kmeans+Decision Tree on way, their four-layer CNN outputs r one-dimensional vec-
CICIDS2017 and CTU datasets for multi-classification tors for one flow, while our two-layer CNN outputs 1 one-
experiments. The experimental results on the two datasets are dimensional vector for the whole flow and then sends it to
shown in TABLE 6 and TABLE 7 respectively. the LSTM network according to the time step. This further
Through the experimental results in TABLE 6 and makes the model have fewer parameters and greatly reduces
TABLE 7, we found that the classification accuracy of flow the cache storage space, which is an important reason for
on the CICIDS2017 dataset exceeded 94%, but the exper- the fast convergence of our model. In order to illustrate the
imental results were inferior to the experimental results of convergence performance of our model in detail, we give the
the deep hierarchical network model proposed by us. The training and test time of multi-classification of the model on
experimental results on the CTU dataset are more than 10 per- CICIDS2017 dataset, and the results are shown in TABLE 8.
centage points worse than our proposed deep hierarchical net- In TABLE 8, the parameters of the experiment were set
work model. The experimental results show that the statistical to be the same as all the above experiments. Only 1 epoch
features and traditional machine learning algorithms can not was trained and test time analysis was performed on each
express the flow information as much as possible, which leads model in 112,000 test samples. From the experimental results,
to the bottleneck of classification accuracy. we found that CNN was more time-consuming than LSTM
Further, we find that the network structure proposed by in the training stage, while LSTM was more time-consuming
Wang et al. [23] is partially similar to the model proposed by than CNN in the test stage. This is because the convolutional
us, but our model can perform better than their experimen- neural network finally recovers to the original input data size
tal results with fewer model parameters and converge very after downsampling. Many of the 0 parameters in the middle
quickly. The difference in the recall metric is very obvious, have become non-zero parameters through learning, so they
we can reach 99.98% but they can only reach 96.91%, which become denser through the fully connected layer. For the
indicates that our model has a very low miss detection rate. LSTM network, since the data is input according to the time
In fact, one-hot-encoding operation is adopted in the data step, the calculation time of the model cannot be significantly
preprocessing part of their model, which not only introduces reduced even in the test stage. Experimental results show
feature engineering operation but also introduces a large that the hierarchical network model proposed by us in the
number of useless parameters to increase the computational test stage only about 26% more time consume compares to a
complexity of the model. Because the operation of their one- single CNN network and LSTM network, but does not require
hot-encoding is to deal with each traffic packet, assuming a additional computing resources.

VOLUME 7, 2019 37013

Y. Zhang et al.: Network Intrusion Detection: Based on Deep Hierarchical Network and Original Flow Data

TABLE 9. Machine learning algorithms convergence analysis. during the construction of the ensemble tree, then the more
important this feature is.

2) GAIN-BASED
The gain-based method is a classical feature importance anal-
ysis method proposed by Breiman [36] in 1984. Gain is the
contribution of loss or impurities to all the divisions of a
In order to compare with traditional machine learning feature. The gain calculation formula for feature A in a tree
algorithms, we trained five classifiers including KNN, Naive- is: g(D,A) = H(D) - H(D|A). Where H(D) is the information
Bayes(NB), Logistic Regression(LR), Random Forest(RF), entropy of feature set D in a given tree, and H(D|A) is the
Decision Tree(DT) on the CIC2017 dataset, and each classi- conditional entropy of feature set D given the condition of
fier was multi-classified using original flow data. We give the feature A. The larger the Gain, the more important the fea-
accuracy of these algorithms, the training time, and the test ture is.
time, the results are shown in Table 9. We find that Random
Forest can achieve the highest classification accuracy, but 3) COVER-BASED
this is still lower than the result of our proposed algorithm The cover-based method is the relative amount of specified
(0.998111). In terms of convergence, these five algorithms features observed in the tree. For example, suppose there are
are quite different. The test time of KNN is about 8 hours, 100 observations, there are 3 trees and 4 different features in
because the algorithm needs to calculate a large number of the ensemble tree, and assume that the node observations of
Euclidean distances. The test time of Random Forest and feature 1 in the three trees are 10, 5 and 2, then the value of
Decision Tree is low, because the depth parameter of the tree cover of feature 1 is 17. Similarly, the larger the cover value
is set relatively small to prevent overfitting. It indicates that of a feature, the more important the feature is.
only a small number of features are required to recognize the
abnormal traffic, and that these strongly separable features B. RESULTS
are derived from header fields (Table 5 shows that header is Three different feature importance analysis methods were
the main separable feature). In addition, because the payload used to analyze the original flow data extracted on the
of transmission in the dataset is very small, the feature matrix CICIDS2017 dataset. We performed binary classification and
is very sparse, which is also the reason why the test time of multi-classification experiments on 1600-dimensional fea-
Random Forest and Decision Tree algorithm is reduced. In the tures. The experimental results are shown in TABLE 10 and
actual network environment, an attacker usually does not send TABLE 11.
a small amount of payload. In this case, the feature matrix will
TABLE 10. Feature importance analysis: binary classification.
not become sparse and the test time will become longer.

V. IMPORTANT FEATURE ANALYSIS

In order to explore why our proposed deep hierarchical net-
work model and flow classification method based on original
flow data can achieve such high accuracy. We further analyze
the features that are important for the flow classification
in the experiments and give the actual meanings of these
important features. In this section, we use three different
important feature analysis methods, and weight the average
of the analysis results of the three methods and finally give According to the experimental results, we found that in the
the top nine feature scores. binary classification and multi-classification the features of
the three different feature importance analysis methods have
some overlap. This suggests that these repeated features do
A. THREE METHODS
have a large impact on the classification results. We compare
The principle of three different feature analysis methods is the actual meaning of a TCP packet field to add the feature
based on the analysis method of ensemble trees, which is scores obtained by the three methods and give the actual
weight-based, gain-based and cover-based. The three differ- meaning of these features in a TCP packet. The results of the
ent methods are described below. analysis are shown in Figure 5. The actual meaning of each
field is shown in TABLE 12.
1) WEIGHT-BASED According to the actual meaning of the field of a TCP
The weight-based [35] method is currently the most com- packet, we find that for multi-classification, the impact of
monly used method, which measures the importance of fea- the TCP payload field on the flow classification is the most
tures by counting the number of times a feature is divided important, and the impact of fragment offset field on the
when constructing a subtree. If a feature is divided more times binary classification is the most important. Combined with

37014 VOLUME 7, 2019

Y. Zhang et al.: Network Intrusion Detection: Based on Deep Hierarchical Network and Original Flow Data

TABLE 11. Feature importance analysis: multi-classification. contributed significantly to abnormal traffic detection and
found features that were rarely used by previous researchers.
In the future work, we will design a traffic collection sys-
tem by ourselves. Use our designed traffic acquisition system
to collect real-world traffic data under the environment of our
lab for analysis to detect suspicious attack traffic and evaluate
test results. In addition, we will improve our hierarchical
network model to make the network deeper, enabling the
model to detect unknown types of attacks that have not been
trained.
TABLE 12. The actual meanings of the fields.
REFERENCES
[1] D. J. Weller-Fahy, B. J. Borghetti, and A. A. Sodemann, ‘‘A survey of
distance and similarity measures used within network intrusion anomaly
detection,’’ IEEE Commun. Surveys Tuts., vol. 17, no. 1, pp. 70–91,
Jan. 2015.
[2] M. Ahmed, A. N. Mahmood, and J. Hu, ‘‘A survey of network anomaly
detection techniques,’’ J. Netw. Comput. Appl., vol. 60, pp. 19–31,
Jan. 2016.
[3] J. Zhang, C. Chen, Y. Xiang, W. Zhou, and Y. Xiang, ‘‘Internet traffic
classification by aggregating correlated naive Bayes predictions,’’ IEEE
Trans. Inf. Forensics Security, vol. 8, no. 1, pp. 5–15, Jan. 2013.
[4] A. Dainotti, F. Gargiulo, L. I. Kuncheva, A. Pescapè, and C. Sansone,
‘‘Identification of traffic flows hiding behind TCP Port 80,’’ in Proc. IEEE
Int. Conf. Commun., May 2010, pp. 1–6.
[5] S. Sen, O. Spatscheck, and D. Wang, ‘‘Accurate, scalable in-network
identification of p2p traffic using application signatures,’’ in Proc. 13th
Int. Conf. World Wide Web, pp. 512–521.
[6] K. Wang, G. Cretu, and S. J. Stolfo, ‘‘Anomalous payload-based worm
detection and signature generation,’’ in Proc. Int. Workshop Recent Adv.
Intrusion Detection, 2005, pp. 227–246.
[7] R. T. El-Maghraby, N. M. A. Elazim, and A. M. Bahaa-Eldin, ‘‘A survey
on deep packet inspection,’’ in Proc. 12th Int. Conf. Comput. Eng. Syst.
(ICCES), Dec. 2017, pp. 188–197.
[8] D. E. Denning, ‘‘An intrusion-detection model,’’ IEEE Trans. Softw. Eng.,
FIGURE 5. Important feature weighted score. vol. SE-13, no. 2, pp. 222–232, Feb. 1987.
[9] A. Vlăduţu, D. Comăneci, and C. Dobre, ‘‘Internet traffic classification
based on flows’ statistical properties with machine learning,’’ Int. J. Netw.
Manage., vol. 27, no. 3, p. e1929, May 2017.
the urgent pointer, window size and acknowledge number [10] M. Yeo et al., ‘‘Flow-based malware detection using convolutional neural
fields, we can conclude that malicious flows are usually sent network,’’ in Proc. Int. Conf. Inf. Netw. (ICOIN), Jan. 2018, pp. 910–913.
[11] Y. Yu, J. Long, and Z. Cai, ‘‘Session-based network intrusion detection
out in more slices. We found that several of the top 9 features using a deep learning architecture,’’ in Modeling Decisions for Artificial
are features that were previously rarely used by researchers Intelligence. V. Torra, Y. Narukawa, A. Honda, and S. Inoue, Eds. Cham,
in the field of network intrusion detection. Switzerland: Springer, 2017, pp. 144–155.
[12] J. P. Anderson, ‘‘Computer security threat monitoring and surveillance,’’
James P. Anderson Company, Washington, DC, USA, Tech. Rep. 19034,
VI. CONCLUSION 1980.
We consider the artificial design and extraction of the features [13] A. Fahad, Z. Tari, I. Khalil, A. Almalawi, and A. Y. Zomaya, ‘‘An optimal
and stable feature selection approach for traffic classification based on
of the flow for network intrusion detection will lose part of multi-criterion fusion,’’ Future Gener. Comput. Syst., vol. 36, pp. 156–169,
the traffic information and thus affect the detection accuracy. Jul. 2014.
In this paper, we extract the original information of flow and [14] J.-H. Bang, Y.-J. Cho, and K. Kang, ‘‘Anomaly detection of network-
initiated LTE signaling traffic in wireless sensor and actuator networks
use our proposed hierarchical network to detect abnormal based on a hidden semi-Markov model,’’ Comput. Secur., vol. 65,
flow. Our hierarchical network is a specially designed CNN pp. 108–120, Mar. 2017.
and LSTM model that learns spatial and temporal features [15] C. Yang, ‘‘Anomaly network traffic detection algorithm based on infor-
mation entropy measurement under the cloud computing environment,’’
from original flow information. To the best of our knowl- Cluster Comput., vol. 21, pp. 1–9, Jan. 2018. [Online]. Available:
edge, this is the first time that the original information of https://link.springer.com/journal/volumesAndIssues/10586
flow is used for feature learning. The hierarchical network [16] F. Ertam and E. Avcí, ‘‘A new approach for internet traffic classification:
GA-WK-ELM,’’ Measurement, vol. 95, pp. 135–142, Jan. 2017.
model we proposed is significantly better than other net-
[17] S. M. T. Nezhad, M. Nazari, and E. A. Gharavol, ‘‘A Novel DoS and
work intrusion detection models. In this paper, we use the DDoS attacks detection algorithm using ARIMA time series model and
CICIDS2017 dataset and CTU dataset. The experimental chaotic system in computer networks,’’ IEEE Commun. Lett., vol. 20, no. 4,
results on these two datasets show that our proposed model pp. 700–703, Apr. 2016.
[18] B. Li, S. Zhang, and K. Li, ‘‘Towards a multi-layers anomaly detection
can achieve very high accuracy, precision, recall and F1- framework for analyzing network traffic,’’ Concurrency Comput. Pract.
measure. At the same time, we analyzed the features that have Exper., vol. 29, no. 14, p. e3955, Jul. 2017.

VOLUME 7, 2019 37015

Y. Zhang et al.: Network Intrusion Detection: Based on Deep Hierarchical Network and Original Flow Data

[19] S. S. Roy, A. Mallik, R. Gulati, M. S. Obaidat, and P. V. Krishna, ‘‘A deep XU CHEN received the B.S. degree in commu-
learning based artificial neural network approach for intrusion detection,’’ nication engineering from the Chongqing Univer-
in Mathematics and Computing, D. Giri, R. N. Mohapatra, H. Begehr, and sity of Posts and Telecommunications, Chongqing,
M. S. Obaidat, Eds. Singapore: Springer, 2017, pp. 44–53. China, in 2017. He is currently pursuing the
[20] H. Zhou, Y. Wang, X. Lei, and Y. Liu, ‘‘A method of improved CNN B.S. degree with the Beijing University of Posts
traffic classification,’’ in Proc. 13th Int. Conf. Comput. Intell. Secur. (CIS), and Telecommunications, Beijing, China. His
Dec. 2017, pp. 177–181. research interests include deep learning and intru-
[21] X. Yuan, C. Li, and X. Li, ‘‘Deepdefense: Identifying DDoS attack via
sion detection.
deep learning,’’ in Proc. IEEE Int. Conf. Smart Comput. (SMARTCOMP),
May 2017, pp. 1–8.
[22] J. Kim, J. Kim, H. L. T. Thu, and H. Kim, ‘‘Long short term memory
recurrent neural network classifier for intrusion detection,’’ in Proc. Int.
Conf. Platform Technol. Service (PlatCon), Feb. 2016, pp. 1–5.
[23] W. Wang et al., ‘‘HAST-IDS: Learning hierarchical spatial-temporal fea-
tures using deep neural networks to improve intrusion detection,’’ IEEE
Access, vol. 6, pp. 1792–1806, 2018.
[24] T. Wang, Z. Guo, H. Chen, and W. Liu, ‘‘BWManager: Mitigating denial
of service attacks in software-defined networks through bandwidth predic-
tion,’’ IEEE Trans. Netw. Service Manage., vol. 15, no. 4, pp. 1235–1248, LEI JIN received the B.S. and M.S. degrees in
Dec. 2018. electrical and electronics engineering from the
[25] V. F. Taylor, R. Spolaor, M. Conti, and I. Martinovic, ‘‘Robust smartphone Beijing University of Posts and Telecommunica-
app identification via encrypted network traffic analysis,’’ IEEE Trans. Inf. tions, Beijing, China, in 2015 and 2017, respec-
Forensics Security, vol. 13, no. 1, pp. 63–78, Jan. 2018. tively, where he is currently pursuing the Ph.D.
[26] M. Matsugu, K. Mori, Y. Mitari, and Y. Kaneda, ‘‘Subject independent degree with the School of Electronic Engineer-
facial expression recognition with robust face detection using a convo-
ing. His current research interest includes artificial
lutional neural network,’’ Neural Netw., vol. 16, nos. 5–6, pp. 555–559,
intelligence.
Jul. 2003.
[27] R. K. Ahuja, Network Flows: Theory, Algorithms, and Applications.
London, U.K.: Pearson Education, 2017.
[28] Y. LeCun et al. (2015). Lenet-5, Convolutional Neural Networks. [Online].
Available: http://yann. lecun. com/exdb/lenet
[29] S. Hochreiter and J. Schmidhuber, ‘‘Long short-term memory,’’ Neural
Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
[30] S. Feghhi and D. J. Leith, ‘‘A Web traffic analysis attack using
only timing information,’’ IEEE Trans. Inf. Forensics Security,
vol. 11, no. 8, pp. 1747–1759, Aug. 2016. [Online]. Available:
http://dx.doi.org/10.1109/TIFS.2016.2551203 XIAOJUAN WANG received the Ph.D. degree
[31] M. Shen, M. Wei, L. Zhu, and M. Wang, ‘‘Classification of encrypted traf- in electrical and electronics engineering from the
fic with second-order Markov chains and application attribute bigrams,’’ Beijing University of Posts and Telecommunica-
IEEE Trans. Inf. Forensics Security, vol. 12, no. 8, pp. 1830–1843,
tions, Beijing, China, where she is currently an
Aug. 2017.
Associate Professor with the School of Electronic
[32] D. Kinga and J. B. Adam, ‘‘A method for stochastic optimization,’’ in Proc.
Int. Conf. Learn. Representations (ICLR), vol. 5, 2015, pp. 1–9. Engineering. Her current research interest includes
[33] I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, ‘‘Toward generating a artificial intelligence.
new intrusion detection dataset and intrusion traffic characterization,’’ in
Proc. ICISSP, Jan. 2018, pp. 108–116.
[34] H. Huang, H. Deng, J. Chen, L. Han, and W. Wang, ‘‘Automatic multi-
task learning system for abnormal network traffic detection,’’ Int. J. Emerg.
Technol. Learn., vol. 13, no. 4, pp. 4–20, Jan. 2018.
[35] T. Chen and C. Guestrin, ‘‘Xgboost: A scalable tree boosting system,’’
in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining,
Jun. 2016, pp. 785–794.
[36] L. Breiman, Classification Regression Trees. Evanston, IL, USA:
Routledge, 2017.

YONG ZHANG received the Ph.D. degree from DA GUO received the Ph.D. degree in electrical
the Beijing University of Posts and Telecommu- engineering from the Beijing University of Posts
nications, China, where he has been an Associate and Telecommunications, where he is currently
Professor with the School of Electronic Engineer- a Senior Engineer. His research interests include
ing. His research interests include self-organizing mobile communications, opportunistic networks,
networks, mobile communications, and cognitive WSN, and P2P networks.
networks.

37016 VOLUME 7, 2019

Understanding Machine Learning
100% (71)
Understanding Machine Learning
416 pages
Intrusion As (Anti) Social Communication: Characterization and Detection
No ratings yet
Intrusion As (Anti) Social Communication: Characterization and Detection
9 pages
李涛英文翻译
No ratings yet
李涛英文翻译
12 pages
Naive Bayes and SVM Based NIDS: Dr. Mrudul Dixit
No ratings yet
Naive Bayes and SVM Based NIDS: Dr. Mrudul Dixit
6 pages
Ring 2018
No ratings yet
Ring 2018
17 pages
1 s2.0 S2352864822001845 Main
No ratings yet
1 s2.0 S2352864822001845 Main
17 pages
The Efficiency of Ensemble Machine Learning Models On Network Intrusion Detection Using KDDCup 99 Dataset
No ratings yet
The Efficiency of Ensemble Machine Learning Models On Network Intrusion Detection Using KDDCup 99 Dataset
5 pages
Approaches For Anomaly Detection in Network - A
No ratings yet
Approaches For Anomaly Detection in Network - A
6 pages
Sensors 22 09326
No ratings yet
Sensors 22 09326
30 pages
ARCADE Adversarially Regularized Convolutional Autoencoder For Network Anomaly Detection
No ratings yet
ARCADE Adversarially Regularized Convolutional Autoencoder For Network Anomaly Detection
14 pages
Document 48
No ratings yet
Document 48
4 pages
Detecting and Preventing Attacks Using Network Intrusion Detection Systems
No ratings yet
Detecting and Preventing Attacks Using Network Intrusion Detection Systems
13 pages
Journal Tiis 14-11 391629281
No ratings yet
Journal Tiis 14-11 391629281
22 pages
Network Traffic Anomaly Detection
No ratings yet
Network Traffic Anomaly Detection
5 pages
Gupta, Govind P. Kulariya, Manish (2016)
No ratings yet
Gupta, Govind P. Kulariya, Manish (2016)
8 pages
Optimized Intrusion Detection System Using Deep Learning Algorithm
100% (1)
Optimized Intrusion Detection System Using Deep Learning Algorithm
8 pages
Flow Dataset For Network Intrusion Detection
No ratings yet
Flow Dataset For Network Intrusion Detection
23 pages
Cyber-Security-Attack-Recognition-On-Cloud-Computing-Ne - 2024 - Results-in-Cont
No ratings yet
Cyber-Security-Attack-Recognition-On-Cloud-Computing-Ne - 2024 - Results-in-Cont
10 pages
Symmetry 15 01205
No ratings yet
Symmetry 15 01205
21 pages
A Robust Intrusion Detection System Empowered by Generative Adversarial Networks
No ratings yet
A Robust Intrusion Detection System Empowered by Generative Adversarial Networks
6 pages
(Chou e JIANG) A Survey On Data-Driven Network Intrusion Detection.
No ratings yet
(Chou e JIANG) A Survey On Data-Driven Network Intrusion Detection.
36 pages
Saber 2020
No ratings yet
Saber 2020
6 pages
Feature Level Fusion of Multi-Source Data For Network Intrusion Detection
No ratings yet
Feature Level Fusion of Multi-Source Data For Network Intrusion Detection
7 pages
76.phikita Phishing Kit Attacks Dataset For Phishing Websites Identification Felipe
No ratings yet
76.phikita Phishing Kit Attacks Dataset For Phishing Websites Identification Felipe
100 pages
1 s2.0 S2214212622000394 Main
No ratings yet
1 s2.0 S2214212622000394 Main
8 pages
Anomaly Detection in Network Traffic For Cybersecurity
No ratings yet
Anomaly Detection in Network Traffic For Cybersecurity
9 pages
Document 1
No ratings yet
Document 1
5 pages
BAT: Deep Learning Methods On Network Intrusion Detection Using NSL-KDD Dataset
No ratings yet
BAT: Deep Learning Methods On Network Intrusion Detection Using NSL-KDD Dataset
11 pages
IEEE Conference Template 115
No ratings yet
IEEE Conference Template 115
7 pages
A Hybrid Intrution Detection Approach Based On Deep Learning
No ratings yet
A Hybrid Intrution Detection Approach Based On Deep Learning
16 pages
High-Efficiency Anomaly Detection of Traffic Data
No ratings yet
High-Efficiency Anomaly Detection of Traffic Data
9 pages
Getting Data Science Done: Managing Projects From Ideas To Products
No ratings yet
Getting Data Science Done: Managing Projects From Ideas To Products
40 pages
Efficient Classifier For R2L and U2R Attacks: P. Gifty Jeya M. Ravichandran C. S. Ravichandran
No ratings yet
Efficient Classifier For R2L and U2R Attacks: P. Gifty Jeya M. Ravichandran C. S. Ravichandran
5 pages
23 31 Network Intrusion Detection Using Wireshark and Machine Learning
No ratings yet
23 31 Network Intrusion Detection Using Wireshark and Machine Learning
9 pages
Paper 54-A Method For Network Intrusion Detection
No ratings yet
Paper 54-A Method For Network Intrusion Detection
9 pages
Firewall+Intrusion+Detection+Hybrid+DNN+ (Repaired)
No ratings yet
Firewall+Intrusion+Detection+Hybrid+DNN+ (Repaired)
22 pages
1 s2.0 S1877050922015137 Main
No ratings yet
1 s2.0 S1877050922015137 Main
8 pages
Network Anomaly Detection Using A Hybrid Approach of Machine H Öztekin
No ratings yet
Network Anomaly Detection Using A Hybrid Approach of Machine H Öztekin
12 pages
Anomaly Detection in Network Traffic Using Machine
No ratings yet
Anomaly Detection in Network Traffic Using Machine
16 pages
Processes 12 01418 v2
No ratings yet
Processes 12 01418 v2
19 pages
Final
No ratings yet
Final
14 pages
A Study On High Speed Outlier Detection
No ratings yet
A Study On High Speed Outlier Detection
17 pages
Detecting Intrusions in Computer Network Traffic With Machine Learning Approaches
No ratings yet
Detecting Intrusions in Computer Network Traffic With Machine Learning Approaches
13 pages
Anomaly-Based Intrusion Detection From Network Flow Features Using Variational Autoencoder
No ratings yet
Anomaly-Based Intrusion Detection From Network Flow Features Using Variational Autoencoder
13 pages
Anti-Attack Intrusion Detection Model Based On MPNN and Traffic Spatiotemporal Characteristics
No ratings yet
Anti-Attack Intrusion Detection Model Based On MPNN and Traffic Spatiotemporal Characteristics
14 pages
Sample Exam Paper - ITC506-19
No ratings yet
Sample Exam Paper - ITC506-19
5 pages
Real Time Detection of Network Traffic Anomalies in Big Data Environments Using Deep Learning Models
No ratings yet
Real Time Detection of Network Traffic Anomalies in Big Data Environments Using Deep Learning Models
11 pages
Multi Level Deep Learning Model For Network Anomal
No ratings yet
Multi Level Deep Learning Model For Network Anomal
12 pages
A Flow Based Method For Abnormal Attack Analysis v5 Revision
No ratings yet
A Flow Based Method For Abnormal Attack Analysis v5 Revision
14 pages
Cloud Security Threats Detection
No ratings yet
Cloud Security Threats Detection
9 pages
Machine Learning-Based Intrusion Detection System For Detecting Web Attacks
No ratings yet
Machine Learning-Based Intrusion Detection System For Detecting Web Attacks
11 pages
Cyber Threat Detection Synopsis
No ratings yet
Cyber Threat Detection Synopsis
14 pages
Artificial Intelligence For Autonomous Networks Gilbert Mazin Instant Download
No ratings yet
Artificial Intelligence For Autonomous Networks Gilbert Mazin Instant Download
82 pages
A Systematic Literature Review of Methods and Datasets For Anomaly Based Network Intrusion Detection
No ratings yet
A Systematic Literature Review of Methods and Datasets For Anomaly Based Network Intrusion Detection
20 pages
Machine Learning Life Cycle
No ratings yet
Machine Learning Life Cycle
11 pages
OWASP LLM - GenAI Security Solutions Reference Guide v1.1.25
No ratings yet
OWASP LLM - GenAI Security Solutions Reference Guide v1.1.25
58 pages
A Novel Framework For Intrusion Detection Using Distributed Collaboration Detection Scheme in Packet Header Data
No ratings yet
A Novel Framework For Intrusion Detection Using Distributed Collaboration Detection Scheme in Packet Header Data
16 pages
Week 6 Lab - Configuring IPv4 Static and Default Routes
No ratings yet
Week 6 Lab - Configuring IPv4 Static and Default Routes
7 pages
CNN-based Network Intrusion Detection and Classification Model For Cyber-Attacks
No ratings yet
CNN-based Network Intrusion Detection and Classification Model For Cyber-Attacks
9 pages
Practice Questions
100% (1)
Practice Questions
9 pages
Brute Force
No ratings yet
Brute Force
7 pages
Conference Paper NIDS PCAP Transformation
No ratings yet
Conference Paper NIDS PCAP Transformation
11 pages
Machine Learning Online Bootcamp Beginners Track Curriculum
No ratings yet
Machine Learning Online Bootcamp Beginners Track Curriculum
9 pages
Intrusion Detection in Software Defined Network Using Machine Learning
No ratings yet
Intrusion Detection in Software Defined Network Using Machine Learning
11 pages
Week 9 Lab - Configuring and Modifying Standard IPv4 ACLs - ILM
No ratings yet
Week 9 Lab - Configuring and Modifying Standard IPv4 ACLs - ILM
10 pages
Minor Specialization (2)
No ratings yet
Minor Specialization (2)
24 pages
Comparative Analysis of Feature Selection Techniques For LSTM Based Network Intrusion Detection Models
No ratings yet
Comparative Analysis of Feature Selection Techniques For LSTM Based Network Intrusion Detection Models
11 pages
Embedded Deep Learning Accelerators - A Survey On Recent Advances
No ratings yet
Embedded Deep Learning Accelerators - A Survey On Recent Advances
19 pages
SWEP200 Questions (Compiled)
No ratings yet
SWEP200 Questions (Compiled)
23 pages
El2015machine - What Is ML
No ratings yet
El2015machine - What Is ML
9 pages
Best Data Science Institute
No ratings yet
Best Data Science Institute
20 pages
cs188 Fa23 Note21
No ratings yet
cs188 Fa23 Note21
8 pages
Aktu Btech Cse 5th Sem Syllabus
No ratings yet
Aktu Btech Cse 5th Sem Syllabus
5 pages
AI in Project Management
No ratings yet
AI in Project Management
3 pages
Ids Unit I
No ratings yet
Ids Unit I
46 pages
Intrusion Detection Systems For Iot: Opportunities and Challenges Offered by Edge Computing and Machine Learning
No ratings yet
Intrusion Detection Systems For Iot: Opportunities and Challenges Offered by Edge Computing and Machine Learning
25 pages
Research Paper
No ratings yet
Research Paper
6 pages
Basic Principles of Contract
No ratings yet
Basic Principles of Contract
26 pages
2009.07352v1
No ratings yet
2009.07352v1
38 pages
Ai Important Questions
No ratings yet
Ai Important Questions
9 pages
Cs100 Lesson 1
No ratings yet
Cs100 Lesson 1
12 pages
Network Intrusion Detection System
No ratings yet
Network Intrusion Detection System
46 pages
06 Wordvectors
No ratings yet
06 Wordvectors
96 pages
Ddos Attack Detection Method Based On Improved KNN With The Degree of Ddos Attack in Software-Defined Networks
No ratings yet
Ddos Attack Detection Method Based On Improved KNN With The Degree of Ddos Attack in Software-Defined Networks
10 pages
Convolutional Neural Networks: An Overview and Application in Radiology
No ratings yet
Convolutional Neural Networks: An Overview and Application in Radiology
20 pages
Semi-Supervised K-Means Ddos Detection Method Using Hybrid Feature Selection Algorithm
No ratings yet
Semi-Supervised K-Means Ddos Detection Method Using Hybrid Feature Selection Algorithm
15 pages
Digital Forgery g26
No ratings yet
Digital Forgery g26
42 pages
Proj Complete
No ratings yet
Proj Complete
16 pages
ITC556 SQL: Week 9 Activities (Set 2)
No ratings yet
ITC556 SQL: Week 9 Activities (Set 2)
1 page
2020 Annual Report: For More Information: Investor Relations Mark Howell Media Meg Rayner
No ratings yet
2020 Annual Report: For More Information: Investor Relations Mark Howell Media Meg Rayner
165 pages
Geochemical Anomalies
No ratings yet
Geochemical Anomalies
8 pages
Diffusion Models in Deep Learning
No ratings yet
Diffusion Models in Deep Learning
14 pages
Project Monitoring Controlling Procedure
No ratings yet
Project Monitoring Controlling Procedure
4 pages
Generating Meaning: Active Inference and The Scope and Limits of Passive AI
No ratings yet
Generating Meaning: Active Inference and The Scope and Limits of Passive AI
21 pages
Booklet Stats v8
No ratings yet
Booklet Stats v8
309 pages
Unraveling The Crystal Ball Machine Learning Models For Crude Oil and Natural Gas Volatility Forecasting
No ratings yet
Unraveling The Crystal Ball Machine Learning Models For Crude Oil and Natural Gas Volatility Forecasting
28 pages
Prospectus 2022
No ratings yet
Prospectus 2022
1 page
Analyzing Cloud Security and Cybersecurity Performance Using Data
No ratings yet
Analyzing Cloud Security and Cybersecurity Performance Using Data
32 pages
Prediction of PM2.5 and PM10 in Chiang Mai Province A Comparison of Machine Learning Models
No ratings yet
Prediction of PM2.5 and PM10 in Chiang Mai Province A Comparison of Machine Learning Models
4 pages

Network Intrusion Detection: Based On Deep Hierarchical Network and Original Flow Data

Uploaded by

Network Intrusion Detection: Based On Deep Hierarchical Network and Original Flow Data

Uploaded by

Received February 23, 2019, accepted March 9, 2019, date of publication March 20, 2019, date of current version

Network Intrusion Detection: Based on Deep

I. INTRODUCTION traffic, a feedback message is sent to the network to determine

VOLUME 7, 2019 37005

37006 VOLUME 7, 2019

Algorithm 1 Original Flow Data Extraction

(3) vectorization: Statistics show that the number of traffic

VOLUME 7, 2019 37007

FIGURE 2. CNN network structure model.

CNN network. In this paper, the 1600-dimensional features C. LSTM MODEL

After the convolution operation, the feature map of n ∗ nwill

37008 VOLUME 7, 2019

VOLUME 7, 2019 37009

FIGURE 4. Deep hierarchical network structure model.

Algorithm 2 Training and Testing Process of the Deep Hierarchical Network

37010 VOLUME 7, 2019

TABLE 1. CICIDS2017 dataset.

40*40 grayscale image, the first layer convolution operation

VOLUME 7, 2019 37011

37012 VOLUME 7, 2019

TABLE 6. CICIDS2017 dataset semi-supervised model experimental results.

TABLE 7. CTU dataset semi-supervised model experimental results.

to recognize when performing multi-classification. This is TABLE 8. Model convergence analysis.

VOLUME 7, 2019 37013

V. IMPORTANT FEATURE ANALYSIS

37014 VOLUME 7, 2019

VOLUME 7, 2019 37015

37016 VOLUME 7, 2019

You might also like