Applied Science - 2024 - FN - GNN A Novel Graph Embedding Approach For Enhancing Graph Neural Networks in Network Intrusion Detection Systems
Applied Science - 2024 - FN - GNN A Novel Graph Embedding Approach For Enhancing Graph Neural Networks in Network Intrusion Detection Systems
sciences
Article
FN-GNN: A Novel Graph Embedding Approach for Enhancing
Graph Neural Networks in Network Intrusion Detection Systems
Dinh-Hau Tran 1 and Minho Park 2,3, *
Abstract: With the proliferation of the Internet, network complexities for both commercial and state
organizations have significantly increased, leading to more sophisticated and harder-to-detect net-
work attacks. This evolution poses substantial challenges for intrusion detection systems, threatening
the cybersecurity of organizations and national infrastructure alike. Although numerous deep learn-
ing techniques such as convolutional neural networks (CNNs), recurrent neural networks (RNNs),
and graph neural networks (GNNs) have been applied to detect various network attacks, they face
limitations due to the lack of standardized input data, affecting model accuracy and performance.
This paper proposes a novel preprocessing method for flow data from network intrusion detection
systems (NIDSs), enhancing the efficacy of a graph neural network model in malicious flow detection.
Our approach initializes graph nodes with data derived from flow features and constructs graph
edges through the analysis of IP relationships within the system. Additionally, we propose a new
graph model based on the combination of the graph neural network (GCN) model and SAGEConv, a
variant of the GraphSAGE model. The proposed model leverages the strengths while addressing
the limitations encountered by the previous models. Evaluations on two IDS datasets, CICIDS-2017
and UNSW-NB15, demonstrate that our model outperforms existing methods, offering a significant
advancement in the detection of network threats. This work not only addresses a critical gap in the
standardization of input data for deep learning models in cybersecurity but also proposes a scalable
Citation: Tran, D.-H.; Park, M. solution for improving the intrusion detection accuracy.
FN-GNN: A Novel Graph Embedding
Approach for Enhancing Graph Keywords: intrusion detection system (IDS); graph neural network (GNN); deep learning; flow-based
Neural Networks in Network
characteristic; feature engineering
Intrusion Detection Systems. Appl. Sci.
2024, 14, 6932. https://doi.org/
10.3390/app14166932
packet size, and inter-arrival times are considered. As a sensitive shield, NIDS detects
external threats or potential risks within the network system. In the face of these challenges,
NIDS systems are experiencing significant limitations in effectively detecting unknown
attacks or zero-day attacks [4]. Indeed, the primary detection mechanisms of NIDS such as
signature-based or anomaly-based detection are easily circumvented by modern attacks
or generate false alarms. Therefore, integrating various new techniques into NIDS to
enhance the detection performance is always considered an urgent requirement for modern
network systems.
Recently, machine learning (ML) and deep learning (DL) have been employed in various
fields, such as image processing [5,6], storage systems [7–9], wireless communication [10], and
cybersecurity [11]. Many deep learning approaches, such as convolutional neural networks
(CNNs), recurrent neural networks (RNNs), and traditional multi-layer perceptrons (MLPs),
have also been applied to NIDS to enhance the network monitoring efficiency. However,
these techniques exhibit limited effectiveness when applied to datasets comprising network
flows due to a mismatch between the models and the type of data being monitored by the
IDS. Conventional DL models are often trained on flat data structures, such as vectors or
grid data, rendering them incapable of exploiting the complex structures of network flows.
The information embedded in these complex structures is crucial for detecting advanced
persistent threats (APTs) or zero-day attacks. Furthermore, the employed ML techniques
focus on analyzing individual network flows, neglecting their inter-dependencies, as seen
in [12,13].
Among the various research techniques in deep learning, graph neural network (GNN)
models are particularly well suited for analyzing traffic data. GNN is a subclass of deep
learning techniques designed to operate on graph-structured data, consisting of ‘nodes’
and the connections between them, called ‘edges’. This type of structure is well suited
for representing relationships in various domains, such as social networks, transportation
networks, and molecular structures. Similarly, network traffic, which consists of multiple
flows, can naturally be represented as graph data. Moreover, the mechanism of GNN
models to aggregate the information from neighboring nodes allows them to exploit the
complex structures present in network data. Instead of relying on predefined features,
GNNs can learn relevant features directly from the data. This reduces the dependency
on manual feature engineering and enables the model to discover intricate patterns that
might indicate malicious activity. The information contained in these patterns is crucial for
detecting APT and zero-day attacks. Thus, GNN can significantly improve the performance
of NIDS by utilizing the inherent graph structure of network traffic.
However, GNN-based IDS still has not achieved the desired reliability and stability
level, as the model’s input data has not been optimized before training. Previous research
has mainly focused on creating graph data based on network topology or only using
a component of the graph data, such as nodes or edges. Meanwhile, both nodes and
edges are crucial elements of the graph that need to be simultaneously exploited for the
model to learn contextually relevant information. Therefore, in this study, we propose a
model called the flow-node graph neural network (FN-GNN) to design graph data from
network flows. In this proposed model, the graph data are formed using a completely
new approach. Specifically, the set of the most important features of flows is represented
as nodes. Simultaneously, the edges of the graph are formed by utilizing the correlation
between flows that share the same source IP address. This approach helps generate graph
data that already contain information about the relationships within them. Additionally,
this method preserves the maximum amount of network information since it considers the
entire dataset as a whole rather than treating each data point independently. Thus, this
approach is the most appropriate and effective for network flow data.
Furthermore, our research also proposed a new graph model architecture based on
a combination of existing models, GCN and SAGEConv. This combination helps the
new model overcome the limitations of previous models and significantly enhances the
performance of network attack detection. We implemented the model on two benchmark
Appl. Sci. 2024, 14, 6932 3 of 23
datasets, CIC-IDS2017 and UNSW-NB15. The experimental results show that the accuracy
of the proposed model reached 99.76% for the CIC-IDS2017 dataset and 98.65% for the
UNSW-NB15 dataset.
We summarize our contributions as follows:
• We proposed the FN-GNN model, a novel approach to represent the network flow
data into nodes and edges in graph data.
• We proposed a new graph model architecture combined with the GCN and SAGEConv
models, which significantly improve the performance of the intrusion detection system.
• The proposed model was applied to two standard datasets and we supplied the
simulation results to prove the effectiveness of the proposed model.
The remainder of this paper is organized as follows. Section 2 reviews the relevant
literature and related work. Section 3 provides background information necessary for
understanding the research, including an overview of the NIDS system and GNN mod-
els. Our proposed FN-GNN model is introduced in Section 4. Section 5 describes the
experimental setup, detailing the datasets and evaluation methodology used to assess the
model’s performance. Section 6 presents and analyzes the experimental results, evaluating
the effectiveness of the FN-GNN model compared to the existing method. Finally, Section 7
concludes the paper while also discussing limitations and outlining potential future work.
2. Related Works
In this section, we focus on presenting some recent studies, which are NIDS models
based on graph neural networks. These are the most common approaches to using graph
models to represent data collected from network traffic. Additionally, we also highlight the
differences between the proposed methods and those previously existing.
A common approach to applying GNN models to NIDS is to represent network
flow data as graphs, with nodes identified by hosts or devices in the network, while the
remaining data are placed into edges. For instance, ref. [14] represented flow data in a graph
format, with network traffic flows mapped to the graph edges and the endpoints as nodes,
whereas paper [15] proposed a method to represent network flows as a graph, where each
node includes flow features in sort of -tuple (IP src, IP dst, port, protocol, request, response).
Another graph representation approach was also presented in paper [16], where all data
are represented as nodes of the graph. Specifically, the authors introduced a heterogeneous
graph constructed from network flows, with three nodes created for each flow: the source
host node, the flow node, and the destination host node. Paper [17] even proposed a
model with graph data generated without including any flow or node features but only
considering the network’s topology. They ignored the edge features and initialized the
node features with vectors of all ones. We recognized the commonality among these studies
that the graphs are constructed based on the inherent network structure, thus resulting in
graph data resembling the topology of computer networks. This approach did not allow
the model to fully exploit the relationships between the generated flows in the network.
This can be explained from the perspective of the GNN model’s concept. According to the
theory of GNN, nodes that are similar to each other are often connected through edges,
while the network system’s topology structure cannot generalize that relationship. This
is one of the main differences between our method and previous methods. In this study,
the graph data are generated by considering the relationships between flows, with each
node in the graph representing the characteristics of a flow. Thus, the nodes in this graph
represent their inherent similarities when connected by edges. This approach aligns with
the theory of graph data representation we discussed earlier.
While approaches like the above primarily classify flows based on edge features, some
studies [18–21] consider both node and edge features. However, leveraging edge features is
negligible as they mainly focus on node features and only use edge features to improve the
passing of messages between nodes. Moreover, in papers [22,23], threats to the system were
not extensively considered, as the proposed models only accounted for packets and flows
transported between specific endpoints within the network. Likewise, the studies [23,24]
Appl. Sci. 2024, 14, 6932 4 of 23
ReLU Linear
… …
𝑢1 𝑢3
𝑥
𝑢4
Convolution layer Convolution layer
𝑢2
Messages Passing
3. Background
3.1. Intrusion Detection Systems
An IDS is a widely used network security technology employed across various enter-
prise and organizational networks. As the name suggests, an IDS is a system or software
established in a network system that has the role of monitoring the traffic and immediately
sending alerts to administrators if malicious activities or policy violations are detected.
The most optimal and common position for an IDS to be placed is “behind-the-firewall”
(strategic points) because this placement enables the IDS to have high visibility into in-
coming network traffic while ensuring that it does not intercept traffic between users
and the network.
The NIDS, which is shown in Figure 2 is one of the two main types of IDS and
it is strategically positioned within the network to analyze the traffic originating from
all devices on the network. Traditional NIDS systems use two main attack detection
methods: signature-based and anomaly-based methods. The signature-based method uses
pre-defined attack signatures through rule sets to effectively identify known threats and
malicious patterns. This method helps the system respond accurately and promptly to
known attacks, but it is largely ineffective at detecting new attacks or other variations
of known attacks. Additionally, the anomaly-based method establishes the baselines of
normal network behavior, allowing it to detect anomalous activities that do not match
regular patterns, thereby identifying them as suspicious behaviors and sending alerts to
the administrator. This method still does not guarantee reliability, as it still faces limitations
such as high rates of false positives and false alarms. To overcome the limitations of
traditional NIDS systems, many deep learning techniques, such as CNN [29,30], RNN [31],
and GNN [32] were applied to NIDS to improve the accuracy and performance of detecting
cyber-attacks.
Trusted network
Figure 2. Diagram of the IDS model.
classification. The most crucial concept of a graph neural network is the message-passing
mechanism that is presented in Figure 3.
TARGET NODE
AGGREGATE
INPUT GRAPH
The GNN propagates information across the graph through a series of message-
passing steps. This mechanism allows the GNN layer to update the hidden state of each
node from its neighborhood nodes. This process is repeated, in parallel, for all nodes in the
graph and thus, the hidden state of the graph is also aggregated and updated continuously
through each GNN layer. In the GNN model, the message-passing mechanism takes place
through two stages: aggregation and update. Initially, information on neighboring nodes
is compiled and sent to the node that needs to be updated as a ‘message’. Then, this
information is updated for that node along with the information statuses it is storing in
the previous layer. After passing through multiple GNN layers, the resulting output is the
final embedded representation of the graph’s nodes. These embeddings are subsequently
employed to address various tasks, including node classification, graph classification, and
link prediction in different methods.
Observing the structure of a graph, we notice that nodes with similar features or
properties are often connected. Therefore, the GNN exploits this fact to learn how and why
specific nodes connect while others do not. This is why the message-passing mechanism is
widely regarded as the most critical strength of GNNs.
3.3.1. GCN
The GCN [33] is one of the most basic graph neural networks designed to operate on
graphs. As the name suggests, it is inspired by the CNN model. GCN can be understood
as performing a convolution in the same way that traditional CNN performs a convolution-
like operation when operating on images. Fundamentally, a GCN takes as input a graph
together with a set of feature vectors where each node is associated with its feature vec-
tor. The GCN is then composed of a series of graph convolutional layers that iteratively
transform the feature vectors at each node. The GCN layers use the message-passing
mechanism previously mentioned to aggregate information from neighboring nodes and
reflect it into the current node’s representation. This same procedure is carried out at every
node. The output of each GCN layer serves as the input data for the subsequent GCN layer.
Consequently, the graph data are transformed into new embedding through the layers,
and the final neural network layer utilizes these embeddings to address tasks such as node
classification and graph classification. The GCN model is described as shown in Figure 1.
The hidden states of nodes at each layer are made up of two consecutive processes:
aggregation and update. This is where the idea of ‘convolutional’ comes into play. The
hidden states of each GCN layer can be updated through the following formula:
Appl. Sci. 2024, 14, 6932 7 of 23
(l )
H(l ) = σ (AH(l −1) W(l −1) + b ), (1)
where
A: the adjacency matrix of the graph;
H: the node feature matrix;
W: the GCN layer’s weight matrix;
b: the bias number;
σ: the activation function.
At each node updated in the lth layer, the information is updated based on the
connections between neighboring nodes with that node, which can be seen as the ‘mask’
of a node. This mask plays a role similar to the concept of a kernel in a CNN model. The
nodes of the graph are sequentially updated by sliding these ‘masks’ over each vertex and
performing information aggregation right there. This aggregation is typically achieved by
multiplying two matrices H and W, as outlined in the formula above. However, at its core,
it still embodies the essence of ‘convolution’ in aggregating information from each node’s
neighbors. This is the reason behind the name of the GCN.
3.3.2. GraphSAGE
GraphSAGE [34] stands for graph sample and aggregate. It is a GNN model for large
graphs, and was introduced for the first time in 2017. Unlike other models such as GCN
and GAT that aggregate information from all neighboring nodes of a node, GraphSAGE
pre-specifies the number of neighboring nodes at each node that is aggregated to update its
embedding information. This aggregator method of GraphSAGE helps the model overcome
the limitations of traditional GNN models when processing large graph data.
Based on the above idea, the message-passing mechanism in GraphSAGE includes two
processes: neighborhood sampling and aggregation. The sampling operation is denoted by:
where Ns (u) represents the set of neighborhood samples of node u, with s denoting the
number of nodes selected from the total number of neighboring nodes N (u) of node u. By
choosing s neighbors for each node, GraphSAGE helps the model significantly reduce the
size of the computational graph and memory requirements, thereby reducing the space
and time complexity of the algorithm. After specifying the number of neighbors for each
node, GraphSAGE utilizes aggregation functions to synthesize information from them:
(l ) ( l −1)
h N (u) = AGG ({hv , ∀vs. ∈ Ns (u)}), (3)
s
(l )
where h N (u) represents the information aggregated from the selected neighbors and AGG
s
is the aggregation function. In GraphSAGE, many types of aggregation functions can be
applied, including sum, mean-pooling, max-pooling, LSTM, and so on.
The aggregated data from that neighborhood are utilized to calculate and update the
embedding for each node, akin to other GNN models:
(l ) ( l −1) ( l −1)
hu = σ([h(u) , h N (u) ].W(l ) ), (4)
s
(l ) ( l −1)
where hu is the embedding of node u at layer l th , calculated from embedding h(u) of
( l −1)
node u in the layer (l − 1)th and the information aggregated from its neighbors h N (u) . W
s
is the weight matrix used at layer l th
of the model.
In general, GraphSAGE can solve the limitations of GCN and GAT models when it
works effectively with large graphs and fast training. However, through experiments, we
can see that GraphSAGE does not significantly improve accuracy compared to other models.
Appl. Sci. 2024, 14, 6932 8 of 23
4. Proposed Method
In prior research, the authors typically utilized nodes or edges to represent the features
extracted from network traffic flows. However, given the capacity of GNN models to
leverage both nodes and edges within graph data, this paper introduces a novel method for
extracting features from traffic flows, encompassing both nodes and edges. Our proposed
model is presented in Figure 4.
Data processing
normal
Within the model, flow data represent the communication exchanged between two
computers or devices within the network system. These data are characterized by features
such as source IP, source port, destination IP, destination port, protocol, etc. We employed
the random forest regression algorithm to evaluate the impact and necessity of each feature
based on the ‘important weight’ index. This allowed us to select a subset of relevant
features from the data to be used in our proposed model. Subsequently, these selected
features were used to construct the nodes and edges in the graph network representation,
as illustrated in Figure 5.
Flows Nodes
Same IP
Graph data
Edges
This feature selection method helps our approach focus on meaningful information in
the flow data and avoid distortions. By doing so, the proposed model can more effectively
exploit the characteristic patterns in the data. This allows the model to achieve higher
accuracy compared to other models that use only a few features from the flow data.
Next, we initialize a graph using the extracted features as nodes. For edge creation,
we leverage the IP addresses presented in each flow of data. Specifically, edges were
established between flows sharing the same source IP. The output of the pre-processing
data is the graph data, which is used to feed the next GNN model.
After the preprocessing step, we obtained graph data. These data are used as input
for the node classification model to find suspicious network flows in the network. In this
study, we proposed a modified version of the GCN model to perform this classification
task. It is presented in Figure 6 with two SAGEConv layers and one fully connected layer
as the output layer. Batch normalization layers and ReLU activation functions are also
used immediately after each SAGEConv layer. The softmax function at the end of the
model helps generate the most efficient predictions based on the class with the highest
probability output.
Appl. Sci. 2024, 14, 6932 9 of 23
Specifically, the feature vector at each node is aggregated through each SAGEConv
layer. This information is then normalized and non-linearized using BatchNorm and the
ReLU function immediately after each SAGEConv layer, as shown in Figure 6. In the
fully connected layer, the current feature vector of the nodes is transformed into vectors
with dimensions equal to the number of classes to be classified. This transformation is
achieved using flattening techniques and the weight matrix multiplication of this layer.
Finally, the softmax activation function is applied to the vector at each node to generate
classification probabilities. The result at each node is a probability vector where the sum of
the distributions is equal to 1. The model classifies nodes based on the highest probability
in this vector, corresponding to the one-hot encoding matrix presented in the above figure.
Input
SAGEConv
layer
BatchNorm
Hidden
layer ReLU
SAGEConv
layer
BatchNorm
ReLU
Output
Fully connected
layer
layer
Softmax
1 0 0
Output 0 1 0
⋮ ⋮ ⋯ ⋮
0 0 1
We proposed this model based on the idea of combining the GCN model with the
SAGEConv module of the GraphSAGE model. GCN is the most popular model of GNN
presented in Figure 1. In the GCN model, GraphConv layers play a key role in learning
graph representations. These layers help the model effectively extract complex features
and structural information through multiple convolution calculations. This architecture
leads to the GCN model achieving high accuracy, but it has some limitations when working
with large graph data. With large graphs, the information of each node is aggregated
from all neighbors, making the data huge and causing system resource requirements and
computing time to increase significantly. Furthermore, information taken from all these
Appl. Sci. 2024, 14, 6932 10 of 23
neighbors may cause embedding nodes to tend to be similar. This phenomenon is called
over-smoothing and reduces the accuracy of the model.
On the other hand, SAGEConv is a variation of the GraphSAGE model, as introduced
in the previous section. It represents an improvement over GraphSAGE by employing
a more expressive convolutional operator. Unlike GraphSAGE, SAGEConv utilizes the
average of neighbor representations, normalized by the degree of each neighbor, as the
aggregate representation. This enhancement enables SAGEConv to capture more fine-
grained information about the graph’s structure.
Our model is the result of combining the advantages of both the GCN and SAGEConv
models. The main difference compared to the old model is that GraphConv layers are
replaced by SAGEConv layers. This combination makes the model more suitable for
training on large graph data while not sacrificing its accuracy and performance. To apply
this model most effectively, choosing model parameters such as hidden units and learning
rate appropriately helps the model achieve the best accuracy and stability. We conducted
experiments on benchmark datasets using the proposed model and achieved superior
results compared to previous methods. The detailed results and evaluation are presented
in the next section.
5. Experiment
In this section, the paper outlines the datasets chosen for training and testing, details
the evaluation criteria used in the experiments, and describes the experimental setup as
well as the selection of parameters for our model.
5.1. Datasets
A dataset for the network intrusion detection system includes many network traffic
flows combined with information about the network system, network devices, servers, and
user behavior. Raw data are firstly collected by capturing the network traffic generated in
the system through network devices such as routers and switches. They are then processed
using specific techniques to create dataset flows. These datasets are especially important
and necessary to evaluate malicious patterns and attacker behavior during cyber-attacks.
Datasets play an important role in training deep learning models. It has a direct impact on
the model’s prediction performance. Therefore, it is necessary to choose quality datasets
that are suitable for the model and the purpose to be achieved. In our experiment, CIC-
IDS2017 and UNSW-NB15 datasets were used to train and evaluate the performance of the
proposed model.
Clustering
Email (ICMP) Email (ICMP) Email (ICMP)
The resulting dataset was labeled based on the timestamp, source and destination
IPs, source and destination ports, protocols, and types of attacks. We experimented using
424,155 flows randomly selected from the dataset, comprising 340,598 (80.3%) normal flows
and 83,557 (19.7%) malicious flows. Each flow in this dataset includes over 80 network flow
features extracted from captured network traffic.
Metric Definition
Recall TP/(TP + FN)
Precision TP/(TP + FP)
F1-score (2 × Recall × Precision)/(Recall + Precision)
Accuracy (TP + TN)/(TP + FP + TN + FN)
5.3. Implementation
The steps of our experimental process are described in Figure 8. Initially, we used
an appropriate feature selection technique to select a set of valuable features from each of
the CIC-IDS2017 and UNSW-NB15 datasets. Then, the obtained data were divided into
training and testing sets to create graph data for the proposed model. After completing the
training process, a node classification model with optimally selected parameters was used
for classifying malicious flows. We simulated the experiment on the hardware with:
Appl. Sci. 2024, 14, 6932 12 of 23
Benchmark
Dataset
Feature Selection
using Random Forest Regression
Node
Implementation of
Classification Parameters
modified GCN model
Model
Results
After normalization, the features within each flow are extracted to constitute the
input data for the model. The feature selection is based on the role of each type of feature
in defining a different type of cyberattack. Therefore, to effectively predict each type
of attack, the deep learning model requires diverse feature types. Those features are
evaluated based on the “important weight” index initially introduced in the random forest
regression algorithm [38]. The random forest method provides the advantage of assessing
the importance of each feature in class prediction based on its individual score. Evaluating
feature scores in a high-dimensional dataset can be challenging. To address this issue, the
random forest method utilizes these importance scores to select a minimal set of highly
Appl. Sci. 2024, 14, 6932 13 of 23
discriminatory features automatically. Leveraging this index, the author of the paper [39]
identified the sets of four features that exert the most influence on each type of attack.
Twelve different types of attack were considered, each corresponding to a set of 4 features.
With each set of these 4 features, the model can learn how to classify and predict each type
of attack with the highest accuracy. In this study, we classified network flows into normal
flows and attack flows. The way to select features was also introduced in the paper [39],
which is to synthesize features from the sets of 4 features mentioned above. In this study, all
attack types were classified under a common label of ‘attack’. Thus, the 4 features obtained
for each of the 12 attack types resulted in a pool of 48 features. After eliminating duplicates,
the number of features was reduced to 18. We selected these 18 features for the process
of creating graph data. In addition, the features “Source IP” and “Destination IP” were
also selected because they are necessary for the graph creation process. Consequently,
20 features were selected from more than 80 features in each flow to construct a graph for
training the model. The list of these 20 features is presented in Table 2 below.
Similarly, for the UNSW-NB15 dataset, columns containing string data or categorical
attributes were converted into numerical format. The ‘attack-cat’ column, which included
a list of attack types, was removed. We utilized the ‘label’ column, which was labeled
with label 0 for benign flows and label 1 for attack flows. The Random Forest Regression
algorithm was used to assess the influence of the features based on the ’important weight’
index mentioned above. This index indicates the influence of each feature in class prediction
and always sums up to 1. After calculating the ‘important weight’, various threshold values
are used to determine the optimal number of features to select. The optimal number of
features is identified at the threshold where the model achieves the highest classification
accuracy. Based on the evaluation results, we selected a set of 32 features out of the total
49 features in the data to use for experiments on the proposed model. The list of 32 features
is provided in Table 3 below.
Figure 10. The accuracy of the training history on the UNSW-NB15 dataset.
The metrics used to evaluate the effectiveness of the model were calculated based on
the formula presented in Table 1. The final evaluation results on two benchmark datasets
are described in Table 5.
Appl. Sci. 2024, 14, 6932 16 of 23
CIC-IDS2017 UNSW-NB15
Metrics/Dataset
Benign Attack Benign Attack
Precision 0.9988 0.9926 0.9916 0.9801
Recall 0.9882 0.9951 0.9838 0.9897
(Detection rate)
F1-score 0.9985 0.9938 0.9877 0.9849
Weighted F1 0.9976 0.9865
Accuracy 0.9976 0.9864
The results indicated that the detection rates (recall) range from 98.38% to 99.51% in
classifying both malicious and normal flows for both datasets. This demonstrated a high
level of confidence. Figures 13 and 14 illustrate the model’s accuracy in detecting malicious
flows on two test datasets using ROC curves. Furthermore, due to the imbalance in the
number of flow types within the datasets, the weighted F1 score was used for evaluation
instead of solely relying on accuracy. Weighted F1 metrics were computed based on the
F1-score values of various flows and their allocation counts in the test dataset. Accordingly,
our model achieves weighted F1 scores of 99.76% and 98.65% on the CIC-IDS2017 and
UNSW-NB15 datasets, respectively. We used these metrics to compare the performance of
the proposed method with previous methods.
Table 6 presents comparative data on the performance and execution time of the pro-
posed model with the E-GraphSAGE and conventional GCN models. These experiments
Appl. Sci. 2024, 14, 6932 18 of 23
were performed with 424,155 flows taken from the CIC-IDS2017 dataset, in which the num-
ber of flows for the train set is 339,324 flows and the test set is 84,831 flows. Comparative
data show that the proposed model always performs better than other models in precision,
recall, and F1-score metrics. This means that the proposed model achieves a higher and
more accurate detection rate of attack flows. However, the training and prediction times of
the proposed model are not excessive compared to previous models. This can be explained
based on how graph data are created from network flows as well as the architecture of each
model. For the E-GraphSAGE model, graph data are created based on the network topology,
in which each IP address corresponds to each node, and the number of edges represents
the number of flows generated between those nodes. Meanwhile, the FN-GNN model
proposes a new data preprocessing method that allows one to exploit the relationships
between flows when creating graph data. In this way, each node represents each flow, and
the number of edges created depends on each connection between flows in the network.
We realized that, with the same amount of network flow, the graph data of the proposed
model have a larger number of nodes and edges and Are more complex than the graph
data of E-graphSAGE. Therefore, the training and prediction time of the proposed model
will also be a bit longer than that of E-GraphSAGE.
On the other hand, the proposed model has a more optimal execution time than the
conventional GCN model. The GCN layer included in the GCN model uses the entire
adjacency matrix to synthesize information from neighboring nodes, while the SAGEConv
module applied in FN-GNN helps represent nodes by synthesizing information from
several neighboring nodes. Therefore, for large graph data, the GCN model must process
information from a vast number of connections, significantly increasing the system resource
requirements and computation time. In contrast, the neighbor sampling mechanism enables
the proposed model to efficiently utilize resources and optimize the computation time
when applied in large-scale network environments.
To objectively assess the model’s performance, we compare its weighted F1 scores with
other existing models based on published results on the same datasets. The selected models
are those that achieved the most outstanding results on the datasets used in this study.
On the CIC-IDS2017 dataset, we compared with models using machine learning
techniques such as OC-SVM/RF [41], SVM, and ANN [42], as well as deep learning models
like CNN-GRU [43], CNN-BiLSTM [44], and a two-phase intrusion detection system with
naïve Bayes [45]. Similarly, on the UNSW-NB15 dataset, models such as AdaBoost, SVM,
and DNN [46], as well as deep learning models like XGBoost-LSTM [47], AT-LSTM [48], and
CNN-GRU [43], are also compared with our model. The comparison results are presented
in Figures 16 and 17. Our model demonstrates a superior performance as well as high
stability across multiple datasets, for both malicious and normal flow classification.
Furthermore, the effectiveness of the proposed model was also evaluated through
comparison with several models employing the same feature selection approach. Specifi-
cally, feature selection techniques based on the random forest regression algorithm were
also applied by the authors in papers [39,49] on the CIC-IDS2017 and UNSW-NB15 datasets,
respectively. With the same selected features from the dataset, our model achieves signifi-
cantly higher effectiveness. The evaluation results are shown in Table 7. This improvement
can be explained by exploiting the relationship between flow data of the GNN model,
through the synthesis of information from the neighbors of each node.
Appl. Sci. 2024, 14, 6932 19 of 23
Figure 16. The comparison of the F1-score between the proposed model and the state-of-the-art
models on the CIC-IDS2017 dataset.
Figure 17. The comparison of the F1-Score between the proposed model and the state-of-the-art
models on the UNSW-NB15 dataset.
Appl. Sci. 2024, 14, 6932 20 of 23
Table 7. Comparison results of the proposed model with existing models using the same feature
selection method.
Experimental results on the two datasets CIC-IDS2017 and UNSW-NB15 show that the
proposed model achieves superior performance and is more stable than existing models.
To evaluate the effectiveness of the FN-GNN model when deployed in practice, the model’s
computational performance and ability to adapt to dynamic network conditions are aspects
that need to be carefully considered. We have presented the computational efficiency of
the model in Table 6. In network systems, NIDS continuously captures network traffic in
the network. This traffic always has real-time characteristics and changes over time. New
attack scenarios and techniques lead to changes in the nature of flows and the emergence
of new traffic patterns in the data captured by the IDS. Furthermore, changes in network
topology also lead to significant changes in network traffic. Those changes require NIDS
systems to be constantly updated and able to adapt to new traffic patterns. The FN-GNN
model provides a method for generating graph data based on evaluating the relationships
of flows without depending on changes in network topology. Consequently, the graph
data are generated flexibly according to the variations in network traffic. This approach
allows the FN-GNN model to adapt to changes in network architecture within dynamic
network environments.
Additionally, in our experiments, we used two datasets that are among the most
relevant and representative of real-world environments. Specifically, these datasets include
scenarios and attack types that closely resemble actual attack patterns encountered in
practical settings. By using these datasets, we ensure that our experiments reflect realistic
network conditions and potential threats, thereby enhancing the applicability and effective-
ness of our proposed model in real-world scenarios. Moreover, the testing data used were
ones that the FN-GNN model had not encountered before. Thus, the model was evaluated
on data as if these were real-world data. We believe that network environments will not
undergo significant changes in the near future. Therefore, the deep learning properties
and the ability to model complex patterns of the proposed model help it maintain a strong
performance in practical environments.
7. Conclusions
In this paper, we proposed the FN-GNN model including the data preprocessing for
graph creation and the modified GCN model. In the data preprocessing, we introduced a
novel approach to represent network flow data as graph data. In this model, the nodes of
the graph represent a set of important features of the flow extracted by using the Random
Forest Regression algorithm. The edges are created based on the relationship between flows
through the source IP feature. The modified GCN model is a combination of conventional
GCN and SAGEConv models. This helps overcome the limitations of previous GNN models.
The proposed model achieved high stability and an accuracy of 99.76% and 98.65% for the
CIC-IDS2017 and UNSW-NB15 datasets, respectively. The evaluation results demonstrated
that our model performs consistently in classifying both normal and malicious flows and
outperforming recent state-of-the-art models. However, we recognized that NIDS systems
always face potential challenges in the future. Thus, we plan to continue researching and
Appl. Sci. 2024, 14, 6932 21 of 23
updating the proposed model with newer datasets to enable the system to promptly detect
complex attack scenarios when deployed in real-world environments.
Author Contributions: Conceptualization, D.-H.T. and M.P.; methodology, D.-H.T. and M.P.; soft-
ware, D.-H.T.; validation, D.-H.T. and M.P.; formal analysis, D.-H.T.; investigation, D.-H.T. and
M.P.; writing—original draft preparation, D.-H.T.; writing—review and editing, D.-H.T. and M.P.;
supervision, M.P.; project administration, M.P.; funding acquisition, M.P. All authors have read and
agreed to the published version of the manuscript.
Funding: This work was jointly supported by the National Research Foundation of Korea (NRF)
via a grant provided by the Korea government (MSIT) (grant no. NRF-2023R1A2C1005461), and
by the MSIT (Ministry of Science and ICT), Korea, under the Convergence Security Core Talent
Training Business Support Program (IITP-2024-RS-2024-00426853) supervised by the IITP (Institute of
Information & Communications Technology Planning & Evaluation).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The original contributions presented in the study are included in the
article, further inquiries can be directed to the corresponding author.
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. Liao, H.J.; Richard Lin, C.H.; Lin, Y.C.; Tung, K.Y. Intrusion detection system: A comprehensive review. J. Netw. Comput. Appl.
2013, 36, 16–24. [CrossRef]
2. Hubballi, N.; Suryanarayanan, V. False alarm minimization techniques in signature-based intrusion detection systems: A survey.
Comput. Commun. 2014, 49, 1–17. [CrossRef]
3. Bhuyan, M.H.; Bhattacharyya, D.K.; Kalita, J.K. Network Anomaly Detection: Methods, Systems and Tools. IEEE Commun. Surv.
Tutor. 2014, 16, 303–336. [CrossRef]
4. Khraisat, A.; Gondal, I.; Vamplew, P.; Kamruzzaman, J. Survey of intrusion detection systems: Techniques, datasets and challenges.
Cybersecurity 2019, 2, 20 . [CrossRef]
5. Do, D.P.; Kim, T.; Na, J.; Kim, J.; Lee, K.; Cho, K.; Hwang, W. D3T: Distinctive Dual-Domain Teacher Zigzagging Across
RGB-Thermal Gap for Domain-Adaptive Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 23313–23322.
6. Duong, M.T.; Lee, S.; Hong, M.C. DMT-Net: Deep Multiple Networks for Low-Light Image Enhancement Based on Retinex
Model. IEEE Access 2023, 11, 132147–132161. [CrossRef]
7. An Nguyen, T.; Lee, J. Design of Non-Isolated Modulation Code with Minimum Hamming Distance of 3 for Bit-Patterned
Media-Recording Systems. IEEE Trans. Magn. 2023, 59, 1–5. [CrossRef]
8. Nguyen, T.; Lee, J. Interference Estimation Using a Recurrent Neural Network Equalizer for Holographic Data Storage Systems.
Appl. Sci. 2023, 13, 11125. [CrossRef]
9. Nguyen, T.A.; Lee, J. A Nonlinear Convolutional Neural Network-Based Equalizer for Holographic Data Storage Systems. Appl.
Sci. 2023, 13, 13029. [CrossRef]
10. Dang, X.T.; Nguyen, H.V.; Shin, O.S. Optimization of IRS-NOMA-Assisted Cell-Free Massive MIMO Systems Using Deep
Reinforcement Learning. IEEE Access 2023, 11, 94402–94414. [CrossRef]
11. Nguyen, T.A.; Park, M. DoH Tunneling Detection System for Enterprise Network Using Deep Learning Technique. Appl. Sci.
2022, 12, 2416. [CrossRef]
12. Sarhan, M.; Layeghy, S.; Moustafa, N.; Portmann, M. NetFlow Datasets for Machine Learning-Based Network Intrusion Detection
Systems. In Proceedings of the Big Data Technologies and Applications, Virtual Event, 11 December 2020; Deze, Z., Huang, H.,
Hou, R., Rho, S., Chilamkurti, N., Eds.; Springer: Cham, Switzerland, 2021; pp. 117–135.
13. Tomar, K.; Bisht, K.; Joshi, K.; Katarya, R. Cyber Attack Detection in IoT using Deep Learning Techniques. In Proceedings of the
2023 6th International Conference on Information Systems and Computer Networks (ISCON), Mathura, India, 3–4 March 2023;
pp. 1–6.
14. Busch, J.; Kocheturov, A.; Tresp, V.; Seidl, T. NF-GNN: Network Flow Graph Neural Networks for Malware Detection and
Classification. In Proceedings of the 33rd International Conference on Scientific and Statistical Database Management, Tampa, FL,
USA, 6–7 July 2021.
15. Zhao, J.; Liu, X.; Yan, Q.; Li, B.H.; Shao, M.; Peng, H. Multi-attributed heterogeneous graph convolutional network for bot
detection. Inf. Sci. 2020, 537, 380–393. [CrossRef]
16. Pujol-Perich, D.; Suarez-Varela, J.; Cabellos-Aparicio, A.; Barlet-Ros, P. Unveiling the potential of Graph Neural Networks for
robust Intrusion Detection. SIGMETRICS Perform. Eval. Rev. 2022, 49, 111–117. [CrossRef]
Appl. Sci. 2024, 14, 6932 22 of 23
17. Zhou, J.; Xu, Z.; Rush, A.M.; Yu, M. Automating Botnet Detection with Graph Neural Networks. arXiv 2020, arXiv:2003.06344.
18. Gong, L.; Cheng, Q. Exploiting edge features for graph neural networks. In Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9211–9219.
19. Jiang, X.; Zhu, R.; Ji, P.; Li, S. Co-Embedding of Nodes and Edges With Graph Neural Networks. IEEE Trans. Pattern Anal. Mach.
Intell. 2023, 45, 7075–7086. [CrossRef]
20. Casas, P.; Vanerio, J.; Ullrich, J.; Findrik, M.; Barlet-Ros, P. GRAPHSEC–Advancing the Application of AI/ML to Network
Security Through Graph Neural Networks. In Proceedings of the International Conference on Machine Learning for Networking,
Paris, France, 28–30 November 2022; Springer: Cham, Switzerland, 2022; pp. 56–71.
21. Schlichtkrull, M.; Kipf, T.; Bloem, P.; van den Berg, R.; Titov, I.; Welling, M. Modeling Relational Data with Graph Convolutional
Networks. In Proceedings of the Extended Semantic Web Conference, Portoroz, Slovenia, 28 May–1 June 2017.
22. Pang, B.; Fu, Y.; Ren, S.; Wang, Y.; Liao, Q.; Jia, Y. CGNN: Traffic Classification with Graph Neural Network. arXiv 2021,
arXiv:2110.09726.
23. Bekerman, D.; Shapira, B.; Rokach, L.; Bar, A. Unknown malware detection using network traffic classification. In Proceedings of
the 2015 IEEE Conference on Communications and Network Security (CNS), Florence, Italy, 28–30 September 2015; pp. 134–142.
24. Xiao, Q.; Liu, J.; Wang, Q.; Jiang, Z.; Wang, X.; Yao, Y. Towards Network Anomaly Detection Using Graph Embedding. In Pro-
ceedings of the Computational Science–ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, 3–5 June 2020.
25. Bilot, T.; Madhoun, N.E.; Agha, K.A.; Zouaoui, A. Graph Neural Networks for Intrusion Detection: A Survey. IEEE Access
2023, 11, 49114–49139. [CrossRef]
26. Tran, D.H.; Park, M. Graph Embedding for Graph Neural Network in Intrusion Detection System. In Proceedings of the 2024
International Conference on Information Networking (ICOIN), Ho Chi Minh City, Vietnam, 17–19 January 2024; pp. 395–397.
27. Zhang, B.; Li, J.; Chen, C.; Lee, K.; Lee, I. A Practical Botnet Traffic Detection System Using GNN; Springer: Berlin/Heidelberg, Germany,
2021; pp. 66–78.
28. Rusch, T.; Bronstein, M.; Mishra, S. A Survey on Oversmoothing in Graph Neural Networks. arXiv 2023, arXiv:2303.10993.
29. Vinayakumar, R.; Soman, K.P.; Poornachandran, P. Applying convolutional neural network for network intrusion detection.
In Proceedings of the 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI),
Udupi, India, 13–16 September 2017; pp. 1222–1228.
30. Ahmad, Z.; Khan, A.S.; Cheah, W.S.; bin Abdullah, J.; Ahmad, F. Network intrusion detection system: A systematic study of
machine learning and deep learning approaches. Trans. Emerg. Telecommun. Technol. 2020, 32, e4150. [CrossRef]
31. Yin, C.; Zhu, Y.; Fei, J.; He, X. A Deep Learning Approach for Intrusion Detection Using Recurrent Neural Networks. IEEE Access
2017, 5, 21954–21961. [CrossRef]
32. Caville, E.; Lo, W.W.; Layeghy, S.; Portmann, M. Anomal-E: A self-supervised network intrusion detection system based on graph
neural networks. Knowl.-Based Syst. 2022, 258, 110030. [CrossRef]
33. Zhang, S.; Tong, H.; Xu, J.; Maciejewski, R. Graph convolutional networks: A comprehensive review. Comput. Soc. Netw.
2019, 6, 11. [CrossRef] [PubMed]
34. Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. Adv. Neural Inf. Process. Syst. 2017, 30,
1024–1034.
35. Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward Generating a New Intrusion Detection Dataset and Intrusion
Traffic Characterization. In Proceedings of the International Conference on Information Systems Security and Privacy,
Funchal—Madeira, Portugal, 22–24 January 2018.
36. Sharafaldin, I.; Gharib, A.; Habibi Lashkari, A.; Ghorbani, A. Towards a Reliable Intrusion Detection Benchmark Dataset. Softw.
Netw. 2017, 2017, 177–200. [CrossRef]
37. Moustafa, N.; Slay, J. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15
network data set). In Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS),
Canberra, NSW, Australia, 10–12 November 2015; pp. 1–6.
38. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.;
et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830.
39. Kostas, K. Anomaly Detection in Networks Using Machine Learning. Ph.D. Thesis , University of Essex, Essex, UK, 2018.
40. Lo, W.W.; Layeghy, S.; Sarhan, M.; Gallagher, M.R.; Portmann, M. E-GraphSAGE: A Graph Neural Network based Intrusion
Detection System for IoT. In Proceeding of the NOMS 2022—2022 IEEE/IFIP Network Operations and Management Symposium,
Budapest, Hungary, 25–29 April 2022; pp. 1–9.
41. Verkerken, M.; D’hooge, L.; Sudyana, D.; Lin, Y.D.; Wauters, T.; Volckaert, B.; De Turck, F. A Novel Multi-Stage Approach for
Hierarchical Intrusion Detection. IEEE Trans. Netw. Serv. Manag. 2023, 20, 3915–3929. [CrossRef]
42. Chua, T.H.; Salam, I. Evaluation of Machine Learning Algorithms in Network-Based Intrusion Detection Using Progressive
Dataset. Symmetry 2023, 15, 1251. [CrossRef]
43. Bakhshi, T.; Ghita, B. Anomaly Detection in Encrypted Internet Traffic Using Hybrid Deep Learning. Secur. Commun. Netw.
2021, 2021, 5363750. [CrossRef]
44. Ghani, H.; Virdee, B.; Salekzamankhani, S. A Deep Learning Approach for Network Intrusion Detection Using a Small Features
Vector. J. Cybersecur. Priv. 2023, 3, 451–463. [CrossRef]
Appl. Sci. 2024, 14, 6932 23 of 23
45. Vishwakarma, M.; Kesswani, N. A new two-phase intrusion detection system with Naïve Bayes machine learning for data
classification and elliptic envelop method for anomaly detection. Decis. Anal. J. 2023, 7, 100233. [CrossRef]
46. Wang, Z.; Liu, Y.; He, D.; Chan, S. Intrusion detection methods based on integrated deep learning model. Comput. Secur.
2021, 103, 102177. [CrossRef]
47. Kasongo, S.M. A deep learning technique for intrusion detection system using a Recurrent Neural Networks based framework.
Comput. Commun. 2023, 199, 113–125. [CrossRef]
48. Alsharaiah, M.; Abualhaj, M.; Baniata, L.; Al-saaidah, A.; Kharma, Q.; Al-Zyoud, M. An innovative network intrusion detection
system (NIDS): Hierarchical deep learning model based on Unsw-Nb15 dataset. Int. J. Data Netw. Sci. 2024, 8, 709–722. [CrossRef]
49. Kharwar, A.; Thakor, D. A Random Forest Algorithm under the Ensemble Approach for Feature Selection and Classification. Int.
J. Commun. Netw. Distrib. Syst. 2023, 29, 426–447.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.