0% found this document useful (0 votes)
6 views34 pages

algorithms-18-00209

This paper conducts a systematic literature review on machine learning-based intrusion detection systems for Distributed Denial of Service (DDoS) attacks in Internet of Things (IoT) networks, focusing on research trends from 2019 to 2024. It highlights the prevalence of certain datasets and machine learning models, achieving high accuracy rates, while discussing the need for lightweight solutions due to hardware limitations. The findings suggest a growing interest in complex models and industry-specific datasets, with a focus on scalable deployment options like Software-Defined Networks and blockchain integration for enhanced security.

Uploaded by

taha.archi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views34 pages

algorithms-18-00209

This paper conducts a systematic literature review on machine learning-based intrusion detection systems for Distributed Denial of Service (DDoS) attacks in Internet of Things (IoT) networks, focusing on research trends from 2019 to 2024. It highlights the prevalence of certain datasets and machine learning models, achieving high accuracy rates, while discussing the need for lightweight solutions due to hardware limitations. The findings suggest a growing interest in complex models and industry-specific datasets, with a focus on scalable deployment options like Software-Defined Networks and blockchain integration for enhanced security.

Uploaded by

taha.archi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

algorithms

Review

Advancements in Machine Learning-Based Intrusion Detection


in IoT: Research Trends and Challenges
Márton Bendegúz Bankó, Szymon Dyszewski, Michaela Králová, Márton Bertalan Limpek, Maria Papaioannou ,
Gaurav Choudhary and Nicola Dragoni *

DTU Compute, Technical University of Denmark, 2800 Kongens Lyngby, Denmark; [email protected] (M.B.B.);
[email protected] (S.D.); [email protected] (M.K.); [email protected] (M.B.L.); [email protected] (M.P.);
[email protected] (G.C.)
* Correspondence: [email protected]

Abstract: This paper presents a systematic literature review based on the PRISMA model
on machine learning-based Distributed Denial of Service (DDoS) attacks in Internet of
Things (IoT) networks. The primary objective of the review is to compare research trends
on deployment options, datasets, and machine learning techniques used in the domain
between 2019 and 2024. The results highlight the dominance of certain datasets (BoT-IoT
and TON_IoT) in combination with Decision Tree (DT) and Random Forest (RF) models,
achieving high median accuracy rates (>99%). This paper discusses various datasets that
are used to train and evaluate machine learning (ML) models for detecting Distributed
Denial of Service (DDoS) attacks in Internet of Things (IoT) networks and how they impact
model performance. Furthermore, the findings suggest that due to hardware limitations,
there is a preference for lightweight ML solutions and preprocessed datasets. Current
trends indicate that larger or industry-specific datasets will continue to gain popularity
alongside more complex ML models, such as deep learning. This emphasizes the need
for robust and scalable deployment options, with Software-Defined Networks (SDNs)
offering flexibility, edge computing being extensively explored in cloud environments, and
blockchain-integrated networks emerging as a promising approach for enhancing security.

Academic Editors: Shun Zhang, Feng Keywords: intrusion detection system (IDS); internet of things (IoT) network; distributed
Gao and Mingyang Ma denial of service (DDoS); machine learning (ML); systematic review
Received: 8 March 2025
Revised: 26 March 2025
Accepted: 4 April 2025
Published: 9 April 2025 1. Introduction
Citation: Bankó, M.B.; Dyszewski, S.;
The IoT refers to devices that are connected and communicate with one another, typi-
Králová, M.; Limpek, M.B.;
cally described as an IoT network. Whether they are used in your smart home, agriculture,
Papaioannou, M.; Choudhary, G.;
Dragoni, N. Advancements in
or medical tools, IoT devices are important to the functioning of the digital world and
Machine Learning-Based Intrusion are quite vulnerable from a cybersecurity perspective. One of the key properties of these
Detection in IoT: Research Trends and devices is that they are resource-limited, meaning that they are an obvious target of dif-
Challenges. Algorithms 2025, 18, 209. ferent forms of cyberattacks, including DDoS. Given their wide adoption in daily life and
https://doi.org/10.3390/a18040209
industry, estimates vary regarding the actual number of IoTs devices in use, with some
Copyright: © 2025 by the authors. sources claiming that there were expected to be approximately 20 billion IoTs devices [1] by
Licensee MDPI, Basel, Switzerland. the end of 2024. However, Forbes claimed a figure over tenfold of that [2]. That being said,
This article is an open access article
the need for a robust security solution for this popular technology is obvious.
distributed under the terms and
Intrusion detection system (IDS) based on ML appear to be a natural first choice to
conditions of the Creative Commons
Attribution (CC BY) license
explore. ML’s flexibility offers a number of benefits, including real-time detection of DDoS
(https://creativecommons.org/ attacks through network analysis. Furthermore, the option of running ML algorithms on
licenses/by/4.0/). different systems can prove useful when memory and computational power are limited.

Algorithms 2025, 18, 209 https://doi.org/10.3390/a18040209


Algorithms 2025, 18, 209 2 of 34

Hence, we use this review as a foundational point to discuss the ML techniques that the
current literature proposes. In order to draw more accurate conclusions from our research,
we identify the datasets that the ML models are trained and tested on, as different datasets
may vary in important properties, which may in turn impact the performance metrics of
the proposed models. Finally, we also consider the deployment environments proposed by
the reviewed papers. Given the constraints of IoT technology, the deployment options can
have an impact on the construction of the ML models and vice versa.
This paper presents a large-scale, in-depth analysis with a distinct focus on DDoS
attacks, distinguishing it from more general or smaller-scale surveys, which are discussed
in greater detail in Section 3. Additionally, our work provides a comprehensive, multi-
dimensional examination of the field, covering deployment strategies, dataset usage, and
a comparative evaluation of ML models, offering a level of detail and a combination of
insights not found in previous reviews.
This paper aims to follow the PRISMA structure in building up a literature review on
ML-based IDSs for DDoS attacks in IoT networks. Our contributions to the topic include
creating an updated survey on the deployment options, datasets, and ML techniques used
in the academic literature published on the topic. We also compare our results in order to
find consistent characteristics as well as gaps in current research and offer direction for
future research. In particular, we aim to answer the following research questions:
1. What are the deployment/platform solutions using machine learning proposed for
mitigating DDoS attacks on IoT networks?
2. Which datasets are used to train and evaluate ML models for detecting DDoS attacks
in IoT networks and how do they impact model performance?
3. How do different ML models compare based on performance metrics, and what
factors contribute to achieving high accuracy rates?
4. What trends have emerged in the use of machine learning for DDoS detection in
IoT networks?
While other reviews on this topic have been published in the past 6 years, this paper
stands out in that it includes significantly more reviewed literature than the comparable
surveys. Moreover, this paper focuses on DDoS attacks and creates a broad survey of
relevant surrounding elements, such as the datasets used, various performance metrics,
and deployment contexts, which other papers may omit. Furthermore, with the PRISMA
framework, this paper is replicable and follows a standardized research structure.
To give an outline of this paper, Section 1 introduces this paper and motivation, while
Section 2 denotes the methodology this paper is based on. Section 3 provides the reader
with a foundational understanding of the discussed domain, while Sections 4–6 offer a full
overview of the literature reviewed. We discuss the results found in Section 7, propose
future research directions in Section 8, and conclude the survey in Section 9.

2. Methodology
2.1. Systematic Literature Review Strategy
Selecting a structured framework for conducting a systematic literature review is a
customary and essential practice in academic research. It ensures methodological precision,
transparency, and reproducibility, enabling researchers to trust and draw reliable conclu-
sions. For its widely accepted and utilized status in the scientific community, we elected to
follow and conduct our review using the Preferred Reporting Items for Systematic Reviews
and Meta-Analyses (PRISMA) [3] framework for this review, ensuring a structured and
transparent research process. The PRISMA framework, first introduced in 2009 and later
refined, is a widely recognized standard for systematic reviews and meta-analyses. It
Algorithms 2025, 18, 209 3 of 34

provides comprehensive guidelines for conducting research, making it particularly suitable


for studies synthesizing evidence from multiple sources. By adopting PRISMA, this study
adheres to best practices in research synthesis, ensuring clarity and credibility. In this paper,
we are using the PRISMA 2020 version.
The framework consists of two main items: the PRISMA flow diagram [4] and the
PRISMA checklist [5].
The PRISMA flow diagram visually represents the process of paper selection for the
study in four key stages:
• Identification—records identified through database searches and additional sources.
• Screening—records screened based on titles and abstracts after duplicates
are removed.
• Eligibility—full text of articles assessed against predefined criteria for inclusion
or exclusion.
• Inclusion—final studies included in qualitative and quantitative synthesis.
The PRISMA checklist is a structured set of 27 points, aimed at ensuring all critical
aspects of a systematic review are addressed, concerning search strategies in databases, the
selection process, and assessing other markers.

2.2. Databases and Paper Selection


Our main search tool was the DTU FindIt [6] tool/library, which offers access to a
wide variety of papers and articles provided to DTU students and staff. As students of the
university, we had access to a larger variety of resources compared to a user without login
credentials; therefore, a logged-in account is necessary to replicate the results described in
this section. This tool aggregates search results from different research databases (although
sometimes partially [7]), including the Institute of Electrical and Electronics Engineers
(IEEE), Association for Computing Machinery (ACM), and arXiv, and provides access to
full-text papers.
Additionally, we also used IEEE Xplore and its search tool. This library is a leading
digital collection for scientific and technical research, providing access to peer-reviewed
journals, conference proceedings, and technical standards published by the IEEE and its
partners. It is widely recognized as a critical resource for engineering, computer science,
and technology research. Compared to our search with DTU FindIt, we were able to find
new studies with IEEE Xplore and append our list of publications considered.
We also inspected the Association for Computing Machinery (ACM) library, a widely
respected resource offering access to peer-reviewed journal articles, and arXiv, an open
access repository for pre-prints and research articles independent of DTU FindIt; however,
the results searching these databases were already covered by DTU FindIt’s search results
in full.
As these databases’ search tools allow for an advanced query, we opted for a detailed
search term with multiple keywords to identify the specific papers to be included in this
review. Our query process went as follows:
1. Locate DTU FindIt and IEEE Xplore database (DTU login necessary);
2. Search for all metadata containing (“IoT” OR “Internet of Things”) AND (“Intrusion
Detection System” or “IDS”);
3. Search for abstract criteria, containing (“ML” OR “Machine Learning”) AND (“IoT
network” OR “IoT networks”) AND (“identification” OR “detection”) AND (“DDOS”
OR “Distributed Denial of Service”);
4. Take items publicized only between 2019 and 30 November 2024.
Our searches resulted in the following number of hits per database:
Algorithms 2025, 18, 209 4 of 34

• DTU FindIt: 65
• IEEE Xplore: 20
These contained 15 duplicates in total; therefore, we concluded our search by identify-
ing 70 unique papers.
During the selection process, certain papers identified in the initial database searches
were excluded for a variety of reasons. Some studies were removed due to incomplete
documentation, lack of peer review, or insufficient methodological details, which limited
their reliability for synthesis. Others were excluded because their scope extended beyond
the focus of this review. Duplicate records and papers presenting redundant findings were
also filtered out to ensure a focused and high-quality set of papers for further analysis.

2.3. Conformance with PRISMA Criteria


As outlined in the PRISMA checklist, we have adapted our methodology to ensure the
validity and transparency of our research. Our approach follows a structured process to
enhance reproducibility and adherence to best practices.
The selection process involved four reviewers who independently screened the records.
Initially, titles and abstracts were evaluated to exclude irrelevant studies. Full-text reviews
were then conducted for studies that met the preliminary criteria, with eligibility assessed
through a detailed evaluation by all four reviewers. Discrepancies between reviewers
were resolved through discussion. Similarly, data extraction was performed independently
by all four reviewers, focusing on selecting key points, which included study objectives,
infrastructure, datasets, ML methods, evaluation metrics, constraints, and results. Verifica-
tion steps ensured data consistency, and the reviewers worked to resolve situations where
information was unclear or unspecified. The research included quantitative outcomes
focused on performance metrics (accuracy, precision, recall, and F1 score) and qualitative
insights into methodologies, and, where applicable, confidence intervals were reported.
Missing data were handled through sensitivity checks, with independent reviews deter-
mining whether missing details were genuinely unavailable. Based on these evaluations,
decisions were made to either exclude, ignore, or estimate missing values using a best-
effort approach. Additional variables, such as dataset types and model architectures, were
recorded, while assumptions were documented. In reference to the heterogeneity of our
data, it was examined by comparing methodologies, datasets, and performance measures,
with no formal meta-regression conducted.
Regarding the finalized results, they were tabulated and visualized using performance
comparison tables and appropriate graphs, and a meta-analysis discovering trends and
patterns is described in the text. Furthermore, sensitivity analyses were conducted by
separating outliers, and their impact was assessed on performance. Certainty in the
evidence was assessed based on the consistency and transparency of reporting across
studies, while confidence levels were evaluated through qualitative assessments rather
than statistical grading systems.
To determine the risk of bias, all reviewers independently evaluated the selected
papers, comparing published results and identifying discrepancies. Any inconsistencies or
missing data were noted, with study design, dataset quality, and reporting transparency
considered in the assessment. Discrepancies were resolved through discussion to ensure a
fair and unbiased evaluation process.

3. Field Assessment
This chapter defines the key components of the reviewed topic. The components
discussed in the literature review are examined individually, but their interconnec-
tions—including associated challenges—are also emphasized.
Algorithms 2025, 18, 209 5 of 34

3.1. IoT
There are many definitions to describe IoT. According to [8], “The IoT is a system
of networked physical objects that contain embedded hardware and software to sense
or interact with the physical world, including human beings”. Based on this definition,
IoT devices operate in a system or network, forming an interconnected ecosystem where
devices communicate and collaborate to perform specific tasks or achieve shared goals.
This connection enables seamless data exchange and interaction between devices and their
environments, driving automation, efficiency, and advanced analytics capabilities. We can
refer to such interconnected systems as IoT networks. The number of connected IoT devices
was expected to grow by 13% in 2024, reaching 18.8 billion, up from 16.6 billion in 2023,
which marked a 15% increase over 2022 [1]. IoT devices are now present in nearly every
industry, with some also forming part of critical infrastructure.
IoT has become a crucial topic due to the inherent limitations of many IoT devices.
This includes low computational power due to these devices being originally created to
perform specific tasks efficiently. Furthermore, security-related features, such as encryption,
may not be fully implemented. While data collection and processing are essential for IoT
applications, privacy issues arise at various stages of this process [9]. Also, the nodes
within an IoT network are susceptible to numerous attacks aiming to disrupt the services
provided by the IoT or take over the entire network. One of the most significant security
threats to IoT systems is DDoS attacks [10,11]. As such, IoT network security has become a
well-researched field in the scientific community. For the keywords ‘IoT security’, IEEE
Explore [12] returns 4405 results between 2024 and 2025. In this systematic literature review,
a subfield of IoT security is reviewed and aims to present the current state of the field.

3.2. Machine Learning


ML is a set of methods used to train machines to make decisions or predictions based
on patterns learned from data. According to [13], ML involves creating programs that opti-
mize performance based on past data and experiences. It focuses on developing algorithms
that can learn from experience and enhance their performance over time. This learning
process may involve modifications to the program’s structure or data. Essentially, machine
learning aims to design programs that automatically improve their effectiveness through
experience. There are many types of ML algorithms that can be used, each with different
advantages and drawbacks. Various techniques exist to determine which methods should
be tested in a given scenario, considering factors such as available data, computational
resources, time constraints for decision-making, and other relevant parameters [14]. It
should be noted that machine learning usually involves a model that should be trained
and a method in which the model is trained and also depends on the type of ML, that is,
supervised learning, unsupervised learning, or reinforcement learning. ML is a complicated
process in itself, and the performance of the model after training is dependent on several
factors, such as the quality of the test dataset and how performance is measured. As a
result, comparing different models can be challenging.
Since we consider DDoS detection to be a classification task (binary or multiclass), we
also chose the metrics/Key Performance Indicators (KPIs) accordingly. In the reviewed
papers, multiple KPIs were considered, including accuracy, precision, recall, and F1 score,
which allowed us to draw certain conclusions during analysis. However, performance
prediction and measurement are also highly dependent on other factors, such as the used
dataset and preprocessing of the data.
Algorithms 2025, 18, 209 6 of 34

3.3. DDoS
A Denial of Service (DoS) attack (commonly referred to as a flood attack), in its simplest
form, involves configuring a device on the internet to repeatedly send requests to another
computer, bypassing the default settings of the command. The data size of each request
can be significantly increased, and the time interval between transmissions can be greatly
reduced. As a result, the target device becomes overwhelmed with an excessive amount
of unnecessary data, ultimately causing it to stop functioning properly. DDoS attacks
are highly covert and cause significant damage by allowing attackers to stay anonymous.
The process involves creating malicious code designed to target specific systems when
triggered. This code spreads across poorly secured systems on the internet and, once
activated, launches an attack from these infected systems simultaneously [15].
These types of attacks (DoS and DDoS) not only prevent legitimate users from ac-
cessing (essential) services, but may lead to further consequences, including increased
costs due to service downtime, recovery efforts, or missed opportunities. Consumed
bandwidth, processing power, and other network resources cause collateral damage to
surrounding systems.
DDoS attacks, while impossible to completely prevent due to the decentralized nature
of the internet, can be effectively managed through a combination of strategies. Adjusting
infrastructure configurations is a crucial step, as demonstrated in the late 1990s when
default router settings were changed to counter Smurf attacks. A Smurf attack is a type
of amplification attack where an attacker sends ICMP echo request packets (pings) to a
network’s broadcast address, spoofing the source IP to be the victim’s address. This causes
all devices on the network to respond with ICMP echo replies to the victim, overwhelming
their system with traffic. Similarly, addressing vulnerabilities like open recursive Domain
Name System (DNS) servers is critical to preventing DNS amplification attacks, although
progress in reconfiguring these servers remains slow. Filtering distinct or unusual traffic
patterns at ingress points is an effective method to minimize disruption by DoS or DDoS
attacks. For instance, upstream routers can block Internet Control Message Protocol (ICMP)
echo request traffic to stop ping flood attacks, while other anomalous traffic can be safely
discarded based on profile analysis. Distributed hosting infrastructures, such as Akamai,
are also useful for dispersing attack traffic across multiple highly connected nodes, reducing
the impact on any single target. However, short DNS Time-To-Live (TTL) values provide
limited benefit unless TTL entries are completely removed. A robust mitigation strategy
typically involves a multi-layered approach. Identifying and shutting down source Internet
Protocol (IP) addresses, deploying routing tricks to drop malicious traffic, and using
high-speed line filtering devices to manage extraneous traffic are all effective techniques.
Defensive measures like SYN proxies can also reduce the effectiveness of certain types
of attacks. SYN proxies are a defensive technique against DDoS attacks that involve
intercepting and managing TCP handshake requests. In large-scale scenarios where attack
traffic reaches tens of gigabits per second, collaboration with Internet Service Providers
(ISPs) to filter incoming traffic becomes essential to maintain normal operations and protect
the network [16].
This literature review aims to help further researchers in leveraging machine learning
and artificial intelligence to identify and respond to DDoS attacks in real-time by presenting
the current state of the field. To achieve this, we take a closer look at popular datasets
and ML techniques already used in relevant studies. We try to identify general trends and
compare the outcomes of the respective studies.
Algorithms 2025, 18, 209 7 of 34

3.4. IDS
According to Lee et al. [17], an IDS is a device or software application that monitors a
network for malicious activity or policy violations.
IDSs are essential tools for identifying unauthorized access or malicious activities in a
system. An intrusion involves accessing a system without authentication or authorization,
including activities like tampering with files, malware execution, or remote attempts to
compromise a system. Basic protective measures such as antivirus software and firewalls are
often insufficient, as malware signatures can be altered and firewall rules can be bypassed.
An IDS provides comprehensive monitoring by analyzing incoming and outgoing network
traffic as well as detecting intrusions, malicious packets, and policy violations. It records
logs and alerts system administrators in real time, offering robust security for organizations.
IDS solutions are also available as hardware platforms or software applications and are
increasingly adopting machine learning algorithms to predict attacks and classify legitimate
traffic. IDS systems are categorized into three main types: Network-based IDS (NIDS),
Host-based IDS (HIDS), and Distributed IDS (DIDS) [18].
IDSs can be classified by their detection methods, including signature-based anal-
ysis, which identifies known attack patterns; protocol-based analysis, which monitors
compliance with protocol rules to detect violations; and anomaly-based analysis, which
focuses on spotting unusual behavior to identify potential unknown threats. These meth-
ods collectively strengthen IDS capabilities in detecting and mitigating cyberattacks. The
anomaly-based analysis addresses limitations in signature-based approaches by detecting
unknown and known attacks through abnormal network behavior. Unlike relying on
predefined signatures, this method uses heuristic rules or machine learning to classify
traffic. It operates in two phases: training to learn the normal behavior of the system and
testing to identify deviations indicating anomalies. Techniques such as neural networks,
data mining, and artificial immune systems are used, supported by other tools. However, a
notable drawback is the occurrence of false positives, where alarms are triggered without
actual threats. Research continues to improve accuracy and reduce false alarms [18].

3.5. ML in IDS for DDoS


This paper’s aim is to review the current state of the ML-based IDS for DDoS attack
detection. Based on the preliminary searches in the field, it can be seen that there are
many different approaches. As we discuss in Section 4, there are many possible ways
to implement the IDS from an infrastructural point of view. This impacts how the data
are processed, how quickly the system can give an alarm (or take action), and the overall
effectiveness of the system. Although this is an important question, many papers are mostly
focused on implementing the ML-based detection-related part and emphasize researching
an optimal method in terms of accuracy, precision, or other metrics, like F1 score.

3.6. Literature Reviews and Surveys


It is important to have information about what literature reviews are available currently
on this topic. Among the papers that we obtained with our search term, the literature
reviews and surveys were also marked. As such, we found two literature reviews in this
domain. Additionally, we conducted another search on DTU Findit, where four additional
relevant studies were found.
The found literature reviews are the following: [19] (2024), [20] (2021), [21] (2020), [22]
(2023), [23] (2024), and [24] (2022). This is the number of surveys found with the original
query in the mentioned database, so the actual number of reviews—if we considered more
databases—would be higher. It should be mentioned that the found literature reviews, in
some cases, present a broader perspective of the topic, and are not specifically looking into
Algorithms 2025, 18, 209 8 of 34

DDoS detection. An overview of recent advancements in ML-based IDS systems for DDoS
detection in IoT networks can be seen in Table 1.

Table 1. State-of-the-art surveys in ML-based IDS systems on DDoS attacks in IoT network domain
(1: threats and attacks, 2: mitigation, 3: performance metrics, 4: research gap), [✓: included,
-: not included].

Paper Year Contribution 1 2 3 4 Differences (with Our Survey)


Our survey offers a broader range of metrics
Focuses on anomaly detection in IoT for evaluating machine learning techniques,
networks using ML/DL techniques, covering beyond just accuracy, wherever feasible,
[19] 2024 ✓ - ✓ ✓
not only DDoS attack detection but also other while also providing a more comprehensive
types of attacks. analysis of the infrastructural and
deployment aspects.
Provides a comprehensive survey of security
issues in IoT layers (perception, network, Our survey presents quantitative data for
support, and application layers), with a various machine learning techniques,
[20] 2021 specific focus on DDoS attacks. It explores the ✓ ✓ - - enabling a direct comparison based on these
types, impacts, and mitigation strategies for figures. However, it does not cover
DDoS attacks in IoT environments. Discusses preprocessing techniques.
data preprocessing techniques as well.
Discusses various types of attacks on a higher
level, including DDoS/DoS, hello flood, and Our paper delves deeper into the evaluation
Sybil attacks, and outlines different IDS of machine learning-based intrusion
[21] 2020 approaches such as ML, SDN, and - ✓ - - detection systems (IDSs), providing a more
Automata-based IDS, which can aid in the extensive analysis supported by
prevention and detection of attacks on IoT quantitative data.
devices.
Presents a literature review on various types
of DDoS attacks leveraging ML techniques. It
evaluates the performance of models using
multiple classifiers to identify the most Our survey also delivers expanded insights
[22] 2023 accurate one. The paper compares boosted - ✓ - ✓ into the infrastructural and
algorithms with standard machine learning deployment aspects.
algorithms, revealing that adaptive boosting
and extreme gradient boosting achieved
higher accuracy rates.
Examines the impact of DDoS attacks, noting
their prevalence in cyber intrusions and the
financial sector. It highlights the need for
efficient IDSs to detect anomalies using ML Our paper explores the related machine
models, addressing challenges like false learning data in greater detail and
negatives in signature-based methods and incorporates a broader selection of ML-based
[23] 2024 ✓ - ✓ -
false positives in anomaly-based approaches. intrusion detection system (IDS) evaluation
The study reviews supervised and papers for comparison, addressing binary
semi-supervised ML techniques for and multiclass classification separately.
traditional networks, IoT, Cloud, and SDN,
proposing ensemble methods for adaptive
rule-based ML algorithms.
Discusses an IoT-specific IDS, focusing on
Our survey offers a concrete comparison of
detection, placement, deployment strategies,
machine learning-based intrusion detection
datasets, machine learning methods, and
[24] 2022 ✓ ✓ - - systems (IDSs) with numerical data,
challenges faced by network IDSs. This paper
alongside a more in-depth description of the
do not discuss some important datasets, like
deployment aspects.
Bot-IoT and TON_IoT.

The reviewed studies highlight the significance of ML- and DL-based IDS solutions for
detecting DDoS in IoT networks. While various approaches have been explored, including
anomaly detection, ensemble learning, and adaptive models, challenges such as dataset di-
versity, real-time implementation, and scalability remain. Compared to previous works, our
paper provides a more detailed evaluation of ML-based IDS systems, particularly focusing
on performance metrics, dataset utilization, and model comparisons. By addressing gaps
such as underexplored datasets and improving detection accuracy, our study contributes
to advancing IDS research in IoT security. Moreover, we prioritized including the latest
Algorithms 2025, 18, 209 9 of 34

studies (published between 2018 and 2024) in our paper, further distinguishing it from
other surveys.

4. Deployment of the IDS


Deployment refers to the process of implementing and integrating a system or ap-
plication within a specific environment. In the context of intrusion detection systems for
DDoS attack mitigation in IoT networks, deployment defines the physical and virtual
infrastructure where the IDS is installed and operated.
The infrastructure and deployment environment are critical because they directly affect
the performance, scalability, and security of the IDS. Factors such as network topology,
device count, data flow patterns, and computational resources play a significant role
in determining the effectiveness of detection mechanisms. Understanding deployment
environments also helps assess practical constraints, such as latency, processing overhead,
and hardware limitations.
In this chapter, we look at the different deployment types found in our review and
discuss them in detail.
Table 2 shows the categorization of all the papers in our research. Table 3 contains
details of papers where a specific infrastructure has been explicitly mentioned or discussed;
papers targeting general networks or not relating to deployment are excluded.

4.1. General IoT Networks


In most studies analyzed, there is no explicit discussion of specific network infrastruc-
tures or deployment. Instead, the primary focus is on the datasets employed to train and
evaluate machine learning models. This suggests that researchers prioritize data character-
istics over network configurations, as robust data are essential for building accurate and
generalized models.
Several datasets are frequently used in IoT-focused intrusion detection studies. These
datasets provide labeled traffic data, simulating both benign and malicious activities, which
are essential for training machine learning algorithms.
Architecturally, the datasets often mimic centralized IoT networks, with sensor nodes
transmitting data to a central server for processing. However, the specifics of these simu-
lated architectures remain secondary to data quality and distribution.
While datasets are discussed in more detail in Section 5, some comments can be
made on the infrastructure of the most popular datasets in our studies, TON_IoT, Bot-IoT,
and CIC-IDS2017. The TON_IoT dataset [25] comprises diverse data sources gathered
from telemetry datasets of IoT and IIoT (Industrial IoT) sensors. These datasets were
collected from a realistic, large-scale network environment that was developed. The Bot-IoT
dataset [26] is a network cluster connected to a public IoT hub by AWS, via the Message
Queuing Telemetry Transport (MQTT) protocol. CIC-IDS2017 [27] includes the profile of
25 users working with many IP-based protocols (HTTPS, FTP, SSH, etc.) in a complete
network topology that includes a modem, firewall, switches, and routers, with Windows,
Ubuntu, and Mac OS X operating systems.
The emphasis on data rather than deployment infrastructure suggests an effort to
ensure machine learning solutions are broadly applicable across multiple IoT environments.
Researchers appear to be developing models that generalize well rather than optimizing for
specific setups, which could limit applicability. This focus on generalization is particularly
relevant given the heterogeneity of IoT systems, where devices, communication protocols,
and deployment scales vary widely.
There are some studies that, while mostly focusing on the dataset and the ML methods,
mention a specific type of infrastructure, which receives more consideration.
Algorithms 2025, 18, 209 10 of 34

In [28–30], there is a focus on fog computing, an approach to network architecture that


delegates a larger amount of computing power near the edge devices. As [28] suggests, fog
computing is a defensive strategy that improves security, while also improving network
routing performance. In [30]’s case, the fog cloud layer processes data and training, and
executes the IDS filtering at a lower level, at the IoT gateway.

4.2. Software-Defined Networks (SDNs)


Software-Defined Networking (SDN) is a modern network infrastructure that sepa-
rates the network controller into three distinct layers. At the application layer, business
logic interacts with the SDN controller to request network services and set configuration
rules. The Control Layer manages traffic and makes routing decisions. Finally, the Data
Layer consists of physical or virtual network devices, which forward data packets based on
the rules set by the SDN controller.
With centralization, SDN eliminates the need for each device to make independent
routing decisions. This architecture allows for the dynamic configuration and optimization
of network resources. Administrators can manage the entire network from a single console
or through automated software, enabling dynamic scaling and responding to demand.
In SDN-based systems, for instance, in [31–33], the control plane, which handles traffic
routing and management, operates as a centralized software module. As per [31], this
separation means adaptive traffic control with added flexibility and scalability. It also
helps with implementing machine learning models, as SDN controllers can monitor traffic
patterns easily. In general, it reduces operational complexity while allowing for automatic
load-balancing and dynamic resource allocation. Refs. [32,34] use a special component
in the controller, called the SDNWISE Flow Table, which specializes in WIreless SEnsors
(WISEs) and uses matches to filter traffic.
Ref. [35] created an SDN-integrated pyramidal, multi-level, conceptually decentralized
multi-controller structured IDS system. The paper combines this structure with SDN data
plane configurations and ML techniques to achieve a robust, real-data-trained IDS. The
MULTI-BLOCK is a multi-layered defence strategy. By leveraging SDN, the framework
implements granular traffic control measures, isolating infected devices and disrupting
botnets. The goal is to contain attacks at the LAN level, minimizing the burden on central
controllers and servers, ensuring critical network availability. It prioritizes the protection
of controllers and distant nodes.
The first module is a controller-to-controller (C2C) communication framework within
SD-IoT networks. It reduces communication overhead and the enhancement of intrusion
attack detection. A secure decentralized communication scheme is introduced, with the goal
of synchronized communication and minimized data exchange between controllers. Within
the realm of C2C communication, one interface encompasses general communications,
whereas the other is for alerts. By adopting this approach, the system effectively reduces
both data control overhead and communication overhead among controllers.
The second module, the proposed P4-Enabled Decentralized Traffic Monitoring System
(P4-DTMS), provides an efficient approach to managing IoT network traffic. The P4-DTMS
module introduces a pipeline, consisting of 24 P4-enabled state tables that capture specific
aspects of traffic. These state tables collectively contribute to creating a comprehensive
network overview and insights into the network’s traffic patterns. Their algorithm then
delves into the state table configuration.
The third module uses the extracted features to implement an ML-based state-driven
identifier of attacks at the data plane stage. Deployed as a firewall, the inbound packets are
first parsed, where statistics are recorded and updated. Occasionally, when the designated
time window closes, results are sent to the higher-level switch. The rules of forwarding or
Algorithms 2025, 18, 209 11 of 34

stopping packets are updated; on the other hand, if the window period is still open, the
process follows existing protocols.
In [10,36,37], the control plane detection is appended by application layer modules,
which are used to describe the requirements or desired behavior of the network. In
both cases, this layer and the IDS module receive network data and perform ML tasks.
Mazhar et al. [37] specifically highlighted the performance and scaling capabilities of their
research, suggesting that high-throughput research centers could be beneficiaries. The
paper compares the performance of their IDS system in a centralized vs. distributed context
of IoT networks, and found that the IDS system tested uses fewer resources and is quite
suitable for low-power devices, such as IoT devices.
Ref. [38] establishes the GADAD (Genetic Algorithm DDoS Attack Detection) system,
which focuses on edge-based technologies in stateful SDN-based networks. The GADAD
system employs tree-based learning techniques and is designed to be deployed on edge
devices in IoT networks to detect both high- and low-volume DDoS attacks. There are
three main phases: network traffic preprocessing, feature engineering, and learning. In
the first phase, it captures network traffic data exchanged between sensors and the edge
server using Wireshark, and flow features are extracted using Zeek. The flow features are
effective in both high- and low-volume attacks compared to packet-based features. The
system introduces feature and depth tuning, a dual method that reduces memory usage
without compromising the system’s detection capabilities. These trained models are then
employed to detect and classify incoming network traffic data on the edge server.
However, SDN networks introduce trade-offs, such as a centralized controller being a
single point of failure. Additionally, as put in [10], the low-resource IoT devices are still
a bottleneck despite improving SDN networks; hence, scaling issues can appear when
low-latency response times require optimized and high-performance algorithms. Ref. [35]
states that the standard SDN structure suffers from single point of failure (SPOF) limitations,
a single controller, which also hinders scalability and performance.

4.3. Edge–Industrial IoT (IIoT) and Wireless Sensor Networks (WSNs)


Edge computing is a term indicating data processing closer to the source of information.
These systems prioritize local computing and routing, minimizing latency and reducing
the load on centralized servers. Ref. [39] describes edge networks as an architecture
consisting of two components: edge servers and edge devices, where the latter forward
computationally intensive tasks and data to the former, closer-positioned servers, rather
than a central cloud or other servers.
While edge computing disperses the load of central computing to many different
nodes, these edge devices have limited resources, leading to worse efficiency in encryp-
tion/decryption, and have an overall lower quality of data connected from these nodes, as
highlighted in [40]. It represents a decentralized computing paradigm where data process-
ing occurs as close to the source of data generation as possible, rather than relying solely on
centralized infrastructure. This shift toward edge computing addresses several challenges
faced by IoT deployments, particularly those requiring low-latency responses and real-time
decision-making.
IIoT networks are considered to be the numerous sensors and interconnected de-
vices that monitor and control critical systems, such as manufacturing processes, man-
ufacturing plants, power grids, etc. Combined with Edge, they form the category of
Edge–IIoT networks, as discussed in [41–43]. Ref. [43] describes IIoT networks as larger-
scale and more machine-based compared to small-scale home networks. The paper also
shows their disadvantages, such as how the heterogeneity of devices and protocols often
complicates integration.
Algorithms 2025, 18, 209 12 of 34

Wireless Sensor Networks (WSNs), as described by [44], consist of spatially distributed


sensors that monitor and collect data about their environment. The paper highlights
problems and solutions of using the MQTT protocol; specifically, these networks are widely
used due to their scalability and ease of deployment. Energy efficiency is a primary
concern, however, as sensor nodes are often powered by batteries with limited lifespans.
Mishra et al. [44] describe utilizing a technology that had its limits. It took several minutes
before the test case anomaly was discovered, which could lead to major losses in real
life. Their findings concur that while simulation has been very helpful in gathering data
and identifying abnormalities, there are still many other routes to explore to enhance
this research.
IIoT, Edge–IoT, and WSNs highlight the diversity of infrastructure approaches in
IoT deployments, each addressing specific performance, scalability, and security needs.
Edge–IoT prioritizes low-latency processing, IIoT emphasizes reliability and uptime in
industrial contexts, while WSNs provide a flexible and scalable framework for distributed
data collection. Despite their distinct focuses, these architectures often overlap in practice,
creating hybrid networks that leverage their combined strengths.

4.4. Blockchain-Integrated Networks (BINs)


Blockchain technology introduces a decentralized ecosystem that enhances security
and privacy in IoT networks. These networks can leverage features like immutable data
records, distributed consensus, and smart contracts.
The blockchain is a decentralized and immutable storage model that consists of all
transaction details that have been initiated by the peer node in the network, stored in a
decentralized distributed ledger. Any transaction processed is verified by the consent of
the majority of network devices [45].
Studies such as [45–48] highlight how implementing blockchain-enabled solutions
eliminates the need for a trusted central authority by distributing control across all network
participants, which enhances fault tolerance and reduces the risk of single points of failure.
It also guarantees data integrity and a verifiable trail, building on the extra security of
cryptographic hashing and digital signatures. The automated contracts stored in the
blockchain can enforce predefined rules and execute actions without intervention, enabling
in-place decision-making and alerting or taking appropriate actions. Some proposed
systems integrate blockchain into their established IoT network [45,47], while others [46,48]
append it with a separate blockchain authentication managing network.
Ref. [49] utilizes blockchain technology in a smart city/urban environment; however,
it mentions that this approach introduces computational overhead, which can limit perfor-
mance in resource-constrained environments. Furthermore, storage requirements increase
as each node stores a complete copy of the blockchain ledger, which can be thousands of
devices. These resource-heavy protocols also introduce latency, throughput bottlenecks,
and higher energy consumption, which is not suitable for some use cases. Ref. [45] states
that blockchain is verifiable and immutable, yet it is vulnerable to different, increasingly
common attacks. DDoS attacks often caused by flooding of the mempool/memory pool in
a blockchain network have severe consequences for legitimate users.

4.5. Medium-Sized Networks


In a two-part paper, Guerra-Manzanares et al. [50,51] explore IDS systems for networks
of medium sizes, which is described as “up to 83 devices”. In [50], it is mentioned that
a balance is represented between scalability and performance. These networks are often
employed in testbeds and simulations to evaluate IDS performance under realistic loads
without overwhelming computational resources.
Algorithms 2025, 18, 209 13 of 34

In [51], Guerra-Manzanares et al. detail how the dataset was created for this specific
kind of circumstance. They state that this research aims to fill this substantial gap by
providing a novel IoT dataset acquired from a medium-sized IoT network architecture,
containing both real and emulated devices with 80 virtual devices and 3 physical devices
deployed. They mention that the size extension allows for the capture of malware spreading
patterns and interactions that cannot be observed in small-sized networks and that no
dataset uses the combination of emulated and real devices within the same network. The
dataset is composed of normal and actual botnet malicious network data acquired from
all the endpoints and servers during the initial propagation steps performed by Mirai,
BashLite, and Torii botnet malware.

4.6. Vehicle-Related Networks


Vehicle-related networks, also known as the Internet of Vehicles (IoV), are an infras-
tructure based on connectivity between vehicles with road-side units, such as cameras or
other electronic devices [52,53]. Smart vehicular networks improve many factors, such
as road safety, driving experience, and decision-making based on the collected informa-
tion, and are used to reduce accidents and increase the performance of driving. Securing
against the alteration, monitoring, and removal of vital messages has been a big concern
due to the vehicle’s connection over a wireless medium making several types of attacks
viable, including DoS/DDoS. Additional difficulties arise due to the traditional design of
vehicles, which often lacks comprehensive security considerations, particularly regarding
autonomous functionality and communication capabilities. Furthermore, the growing
number of networked vehicles increases the attack surface, introduces resource constraints,
and adds complexity to modern automotive systems.
Because of these reasons, reactive/predictive IDSs have recently received more atten-
tion. Refs. [52,53] emphasize real-time communication and low-latency operations. These
networks are essential for autonomous vehicles and smart traffic systems, where the speed
of data exchange and reliability can impact safety and performance.
Gad et al. [53] train classifiers on the TON_IoT database and apply it to ad hoc
networks of vehicles, showing great results despite the dataset being general, supported
by many actions taken against overfitting. Their aim is to employ their model on both
vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) parts of the network.
Ullah et al. [52] used specific car-hacking datasets. In their paper, they propose an ML
model with high accuracy, with the intent for it to be deployed on the vehicles’ firewall.
In the intra-vehicle network, the internal smart devices of a vehicle communicate with
each other and control the communication of the vehicle in the inter-network. The hacker
can target the internal network of the vehicle, which is challenging to defend against, as
it is already met with high demand and resource requirements due to its low latency and
high-pace environment.

4.7. Home-Built Networks


One study took the approach of a custom-built IoT network deployed in a controlled
environment. Kalnoor and Gowrishankar [54] constructed a physical IoT setup in an office
space, featuring typical devices like smart cameras, thermostats, and sensors (the exact list
of peripherals can be found in Table A1).
This approach allows for real-world validation of IDS performance but is less common
due to higher resource demands, and while it presents a case for robust, well-supported
data, the fixed setup could also mean leaving flexibility and the possibility of covering edge
cases in the background.
Algorithms 2025, 18, 209 14 of 34

4.8. Performance
For performance, it can be recognized that thew majority of papers focus on the
machine learning aspect for optimization. The authors of [55] propose that their model is
much more effective than the previous state-of-the-art models, thanks to feature selection
methods and feature dimensionality narrowed down to only 15 features. Compared to that,
The authors of [56] defined new, novel traffic flow features for their ML model, which fit
into fewer resources of IoT network platforms.
Others, such as in Ref. [57], use specific tools, such as SPARK, a widely popular data
engineering tool in Big Data contexts, for fast processing time and efficiency; utilized in the
cloud layer, it improves ML training, and then the model is deployed at the edge. Ref. [58]
states that better correctness and application performance brought on by deep learning’s
capacity for improved intellect can be used to spot new, novel strikes in IoT systems, which
is vital in the context of securing lightweight IoT networks. According to the numbers,
when the authors of [59] measured their ML model, after training and installation on the
network’s firewall, it reached a speed of classifying almost 3 million messages in 1 s.
There have also been different approaches in terms of integrated deployment of IDSs,
or using an additional system to append the existing network. Ref. [46] uses a designated
Blockchain Server (BCS) bearing the responsibility of recording and validating transactions,
while Ref. [60] choose to add a dedicated ML server, to combat these attacks on the network.
Opposing that, SDNs often use their flexibility, such as [37], to integrate the IDS system
into the already established network, where it is said that employing an SDN core enables
real-time intrusion detection and mitigation. In [34], Bhayo et al. adjusted the existing
network by adding a sink module to the IoT controller, containing a logging module
that logs all incoming packets in the forwarding layer. These logs are recorded in the
controller’s directory.

Table 2. Studies categorized by infrastructure.

Infrastructure # of Papers Citations


General IoT network, dataset-focused 32 [28–30,55–82]
Software-Defined Networks (SDNs) 9 [10,31–38]
Edge, Edge–IIoT, Wireless Sensor Network (WSN) 6 [39–44]
Blockchain-integrated networks (BINs) 5 [45–49]
Vehicle-related, Internet of Vehicles 2 [52,53]
Medium-sized networks 2 [50,51]
Home-built network 1 [54]

4.9. Conclusion
The analysis highlights that researchers aim to build adaptable solutions capable of
functioning across diverse environments, leading to minimal focus on specific deploy-
ments. In these general cases, high-quality data take precedence, often overshadowing
considerations related to network infrastructure.
Nevertheless, some architectural patterns stand out. For well-established technologies,
SDNs, Edge, and Edge–IIoT emerge as popular choices. SDNs have been widely adopted
due to their scalability and programmability. Similarly, edge computing has emerged as a
vital component in distributed computing by bringing computation closer to data sources,
reducing latency, and improving efficiency. Additionally, WSNs have been instrumental in
industrial contexts, supporting distributed sensor networks for real-time data collection
and analysis. While these technologies are widely known, blockchain has been an emerging
topic in research in recent years. Apart from its origins, blockchain and the ledger system’s
use cases have been explored since, and evolved into a versatile, decentralized framework
Algorithms 2025, 18, 209 15 of 34

across various domains. It offers promising enhancements in security, but introduces


performance trade-offs, as it is costly to monitor all transactions.

Table 3. Studies assessed by their IDS deployment (only including ones that focused, or made
remarks about, infrastructure).

IDS Detection Response


Paper Network Type Deployment Strategy Data Source
Location Timing
Nodes +
[45] BIN Fog cloud IDS on site instant
Blockchain host
Nodes +
[46] BIN Separate blockchain network on site instant
Blockchain host
Blockchain
[47] BIN Transactions in blockchain blocks on site instant
network
[48] BIN Gateway with blockchain network access Gateway traffic on site instant
Blockchain
[49] BIN, smart city Device+network on site instant
Authenticator
Edge–IoT
[39] Edge–IIoT Flow controller Edge server on site instant
devices
[40] Edge, Sensors IDS installed on cloud servers Sensors off site after analysis
Edge–IIoT,
[43] Sensor traffic-based Sensors off site after analysis
6LoWPAN
IoT/non-IoT
[28] General Fog layer - -
devices
[29] General Fog computing - - -
[59] General Network firewall - - -
[60] General Dedicated ML server - - -
[30] General Fog/cloud layer+IoT gateway Gateway traffic on site instant
IDS in edge
[57] General Cloud layer ML training via Spark on site instant
layer
Combined
[52] IOV Intra/inter-vehicle data - -
datasets
[53] IOV VANET servers Dataset based - -
medium-sized Monitoring
[51] Monitoring server separately off site after analysis
network server
medium-sized Monitoring
[50] Monitoring server separately off site after analysis
network server
WSN, MQTT after analysis +
[44] Integrated IDS IoT devices on site
protocol instant
[32] SDN WISE SDN WISE controller SDN traffic data on site instant
[10] SDN SDN data + application plane SDN traffic data on site instant
[36] SDN SDN control + application layer SDN traffic data on site instant
[31] SDN SDN control plane SDN traffic data on site instant
[33] SDN SDN control plane SDN traffic data on site instant
[37] SDN SDN application layer SDN traffic data on site instant
[34] SDN WISE SDN WISE control plane SDN traffic data on site instant
SDN
[35] Decentralized IDS + SDN data plane SDN traffic data on site instant
w/PCDMCS
[38] SDN, Edge Edge servers Gateway traffic on site instant
WSN, Home
[54] IoT gateway Gateway traffic on site instant
built

5. Overview of Reviewed Datasets for ML-Powered IDS


Different datasets are necessary to train and test the machine learning algorithms,
which are the foundation of this literature review. This chapter aims to examine the data
more closely, highlighting the most used datasets, exploring them in more detail, and
identifying some common patterns and differences between them. All papers in this survey
were reviewed, and different insights came from this research.
Algorithms 2025, 18, 209 16 of 34

Datasets are usually made up of one or more files that may represent a specific file
or attack type. The most common file types are pcap, csv, and txt. These files may have
the same number of features, records, and benign/attack flows, constituting a balanced
dataset; however, unbalanced datasets are far more common. From our research, only the
Edge_IIoT dataset [83] came close to being considered balanced, the sole one out of the
eight most popular datasets used according to Figure 1. Using imbalanced datasets may
lead to biased trained models, though depending on the specifics of the ML algorithm and
data preprocessing, this does not have to be the case.
Generally, datasets can be divided into public and private, that is, those one can openly
access and use and those one cannot. Figure 2 showcases how many papers utilized public
or private datasets, excluding the papers that only include literature reviews. As is noted,
in one case [32], no dataset was specified whatsoever. The 53 papers that included public
datasets are listed in Table 4, alongside their year of publishing, the number of both records
and features, as claimed by the authors, and whether DDoS was included as an attack.
What must be noted is that due to the high variance in the way these datasets are structured
and created, some values in the table may represent an approximation of such values.

Figure 1. Dataset distribution among reviewed papers.

Furthermore, among the papers that included public datasets, on three occa-
sions [38,39,54], the paper also made use of self-generated or private datasets on top
of the public one to enhance or diversify the data or for testing purposes.
In the cases where self-generated or private datasets were used, the description of
how these datasets were acquired varied heavily. For example, Ref. [84] creates a tool
that is a network traffic generator for IoT devices and tests the proposed tool as well as
the generated malicious and benign data in the paper. In other cases [37,38], the paper
describes which existing tools were used to gather the data points and gives a general idea
of the number of flows, features, and/or traffic types that are a part of the dataset.
The overwhelming majority of papers carried out some form of data processing,
e.g., feature filtering or extraction, as the typical first step. In general, the papers chose
a subset of both the features and flows available in the dataset. In fact, this step seems
to be the foundation in the entire domain of ML-based IDSs, as several papers [40,78]
were primarily written with the purpose of proposing improved data preprocessing. One
particular paper [72] stands out from the rest as a standard dataset was used, but the
network traffic data were turned into images, and computer vision ML was utilised to
identify DDoS attacks.
Algorithms 2025, 18, 209 17 of 34

Among the papers, there were several common themes repeated. Firstly, we found
that multiple noted the general lack of publicly available, quality IoT datasets to be used
in their research. The stated reason for lacking datasets was privacy, as large companies
tend to not wish to share their data with researchers [85]. In particular, in cases where an
industry-specific IoT device network traffic was necessary, the papers claimed there to be
no available datasets, which typically resulted in them creating and publishing their own,
as was the case for MedBIoT [51], or self-generating a private dataset, as the paper with a
focus on smart agriculture did [86]. Further concerns were noted regarding unbalanced
datasets that may have an uneven benign-to-attack traffic ratio or unlabeled datasets. While
the majority of the reviewed datasets explicitly included DDoS attacks, a small subset did
not specify their presence, instead relying on DoS or other flooding attacks that may serve
as proxies for DDoS. As shown in Table 4, most studies in our survey utilized datasets
that explicitly feature DDoS attacks. In contrast, a few older or industry-specific datasets,
such as X-IIoTID [87], MedBIoT [51], and KDDCUP 1999 [88]—which are also relatively
infrequent in the literature—either lacked explicit DDoS attack labels or did not provide
sufficient information to confirm their inclusion.

Figure 2. Overview of dataset distribution.

From Figure 1, we observe the most popular datasets to be BoT-IoT [26], TON_IoT [85],
and CIC-IDS2017 [89]. Other CIC datasets are also seen among the most popular datasets.
For the sake of brevity and clarity, TON_IoT and Bot-IoT will be described in greater detail
alongside the CIC datasets, with a special focus on CIC-IDS2017.

5.1. BoT-IoT
The BoT-IoT dataset [26] was created by the Cyber Range Lab of UNSW Canberra in
2019. The dataset is labeled, imbalanced, and was generated in a realistic testbed, usable for
both binary and multiclass classification. The researchers collected data from five simulated
IoT scenarios (weather station, motion-activated lights, garage door, smart fridge, and
smart thermostat) in a testbed environment. Originally, 32 features were collected, such as
IP or port addresses, from which an additional 14 new flow features were generated, like
total or average bytes per IP, totalling 46 features for the dataset. Five types of attacks are
represented in the dataset, including DDoS, DoS, OS and Service Scan, Keylogging, and
Data exfiltration. There are over 73,000,000 attack instances in the dataset and just under
10,000 normal traffic instances in Bot-IoT, constituting an imbalanced dataset. Of the attack
records, over 38,000,000 account for DDoS attacks specifically. This imbalance has prompted
researchers in the domain to develop solutions to overcome the challenges surrounding
the use of the dataset—either by merging it with other datasets [90] or applying various
algorithms [91] to balance the dataset out.
Algorithms 2025, 18, 209 18 of 34

Table 4. Table of public datasets (✓: includes DDoS attacks in the dataset; -: not included
or unknown).

Dataset Year DDoS # of Records # of Features References of Papers


KDDCUP 1999 [88] 1999 - 4,898,431 41 [78]
DARPA2000 [92] 2000 ✓ 200,000+ 41 [54]
NSL-KDD [93] 2009 - 150,000+ 43 [33,66]
ISCX 2012 [94] 2012 ✓ 2,450,324 - [31]
CTU-13 [95] 2014 ✓ 15,000,000 - [31]
UNSW-NB15 [96] 2015 - 250,000+ 48 [56,57,66,70]
SNMP-MIB Dataset [97] 2016 - 4998 34 [44]
CIC-IDS2017 [89] 2017 ✓ 2,830,743 80 [29,33,52,64,68,69,71,79]
N_BaIoT [98] 2018 - 7,062,606 23 [40]
Mirai [99] 2018 ✓ 750,000+ 115 [56]
CSE-CIC-IDS2018 [100] 2018 ✓ 16,232,943 79–83 [52,68,69,80]
DS2OS [101] 2018 ✓ 355,902 11 [76]
CIC-DDoS2019 [102] 2019 ✓ 94,000+ 74 [39,49,61,65,68,72]
[28,30,45,48,55,56,58,61–
BoT-IoT [26] 2019 ✓ 73,360,900 46
63,67,73,75,76,78,82]
IoT23 [103] 2020 ✓ 325,307,990 23 [57,60,81]
TON_IoT [85] 2020 ✓ 22,000,000+ 22–52 [35,38,53,55–58,61,75]
IoTID20 [104] 2020 - 100,000+ 86 [36]
Application-Layer DDoS
2020 - 346,869 78 [47]
Dataset [105]
IoT-CIDDS [77] 2021 ✓ 95,299 21 [77]
ETF IoT Botnet [106] 2021 - 2245 9 [56]
Edge-IIoTset [83] 2022 ✓ 20,952,648 61 [10,35,41–43]
X-IIoTID [87] 2022 - 820,834 68 [35]
MedBIoT [51] 2022 - 17,845,567 100 [50,51]
CICIoT2023 [107] 2023 ✓ 45,000,000+ 47 [46,59,74,76]

5.2. TON_IoT
The TON_IoT dataset [85] was also created by the Cyber Range Lab of UNSW Canberra
in 2020. The dataset is labeled, unbalanced, and was generated from an IoT/IIoT network
testbed. The dataset contains data from heterogeneous sources, gaining its name (TON)
from the data it includes: telemetry, operating systems, and network. The researchers
include simulated sensor data from seven IoT/IIoT sensors (weather station, motion-
activated lights, garage door, smart fridge, smart thermostat, Modbus service, and GPS)
as well as real devices: two phones and a smart TV. Nine types of attacks are represented
in the dataset, including Scanning, DoS, DDoS, ransomware, backdoor, data injection,
Cross-site Scripting, password cracking attack, and Man-in-The-Middle. There are over
22,000,000 total data records, of which just under 800,000 are normal traffic, which means
TON_IoT is also an imbalanced dataset. Over 6,000,000 of the records are from DDoS
attacks. Since the dataset contains multiple sub-datasets with unique processed or raw
data [25], it is not clear how many features the dataset holds in total, though the combined
dataset called combined_IoT_dataset proposed by the original paper [85] uses a total of
22 features.

5.3. CIC-IDS2017
All of the CIC datasets are created by the Canadian Institute for Cybersecurity, some-
times collaborating with an external institution [27]. In the case of CIC-IDS2017 [89], it is
the first IDS dataset that the CIC created, back in 2017. The dataset is labeled, imbalanced,
and was generated based on the most common attacks in 2016. The researchers aimed to
create naturalistic benign background traffic and set out to mimic a real network traffic
Algorithms 2025, 18, 209 19 of 34

capture in a complete network configuration. Seven types of attacks are represented in


the dataset, including DoS, DDoS, Brute Force, Heartbleed, Web, Infiltration, and Botnet.
Almost 2,300,000 of the records account for benign traffic, while the remaining 500,000 are
attacks, with the DoS (250,000) and DDoS (128,000) attacks accounting for over half of the
attack records. The researchers mention that the dataset contains over 80 features, though
in all analyses with this dataset, including the researchers’ own [89], only 80 or fewer
features are extracted. Therefore, this dataset is considered to be imbalanced, and, similar
to the previously discussed datasets, different mitigation strategies for this imbalance are
employed by further research [108].

6. ML Performance Review in IDS


6.1. ML Performance Comparison
This chapter reviews various ML techniques for DDoS attack detection, comparing
their performance across multiple datasets. The analysis aims to identify patterns and
determine the most effective models for this task.
For each paper reviewed, the analysis prioritized the highest-performing models
as reported for a given dataset. When a model was evaluated across multiple datasets,
preference was given to the dataset containing the largest number of records, as larger
datasets often provide more reliable performance insights. Although the BoT-IoT dataset is
frequently used in these studies, this review preferred the TON_IoT dataset. This choice
reflects the higher number of classes in the TON_IoT dataset in comparison to BoT-IoT
(Section 5).
However, identifying the “best” model without a settled benchmark is not an easy
task. The performance metrics used, such as accuracy, precision, and recall, often vary
significantly depending on the dataset characteristics, evaluation methodologies, experi-
mental setups, and preprocessing. Understanding the specific contexts and limitations of
each dataset is crucial. These are discussed in detail in Section 5.
It is also important to note that the optimal models for binary classification tasks are not
always the best for multiclass classification, as these tasks aim for different objectives and
face various dataset characteristics, e.g., Ref. [45] proposed XGBoost for binary classification
and Random Forest (RF) for multiclass classification.
Inconsistent reporting of evaluation metrics across studies (e.g., [78]) is another chal-
lenge, which complicates direct comparisons. DDoS detection can be approached as either
binary (e.g., distinguishing between attack and normal traffic) or multiclass (e.g., identi-
fying specific types of traffic or attack categories). To perform analysis, specific metrics
were extracted from each study, including accuracy, precision, recall, and F1 score, when-
ever reported. These metrics were selected based on their usability for the task and their
ability to provide an even measure of model performance, particularly in the event of
class imbalances.
Some studies, such as [31], were excluded from this comparison due to ambiguity or
the absence of precise numerical metrics.
The datasets employed in the reviewed studies were examined to provide context
for interpreting the findings. This was followed by the construction of comprehensive
performance Tables 5 and 6. These sorted tables, presented on the basis of accuracy for both
binary and multiclass classification tasks, reveal several significant trends in the field. A
particular observation here is the prevalence of above 90% accuracy in a majority of studies,
except in two cases when performance dropped below this rate ([36,66]). Performance in
these models often results from the use of sophisticated methods like feature engineering,
dimensionality reduction, and algorithm-level optimizations that play a significant role in
providing high performance in DDoS detection tasks.
Algorithms 2025, 18, 209 20 of 34

Table 5. Binary classification performance (sorted by accuracy).

Paper Accuracy [%] Precision [%] Recall [%] F1 Score [%] Highest Accuracy ML Technique Dataset(s)
[61] 100 100 100 100 DT TON_IoT
[55] 100 100 100 100 RF TON_IoT
[43] 100 100 100 100 RF Edge-IIoT
[59] 100 100 100 100 DT CICIoT2023
[59] 100 100 100 100 RF CICIoT2023
[45] 99.9987 100 - 99.9993 XGBoost BoT-IoT
[74] 99.99 99.99 99.99 99.99 KNN, LR, DNN CIC2023 IoT
[78] 99.99 - - - JRip classifier BoT-IoT
[80] 99.98 99 89 - DT CSE-CIC-IDS2018
[33] 99.96 99.99 100 100 XGBoost CIC-IDS2017
[82] 99.96 99.96 99.96 99.96 RF BoT-IoT
[47] 99.95 99.66 99.61 99.63 RF Application-Layer DDoS Dataset
[44] 99.94 99.87 99.83 99.85 RF SNMP-MIB Dataset
[76] 99.92 - - - Stacking using XGBoost BoT-IoT
[67] 99.9 99.9 99.9 99.9 RF BoT-IoT
[10] 99.79 99.09 99.77 99.43 RF Edge-IIoTset
[71] 99.73 99.97 93.3 96.54 MLP CICIDS2017
[35] 98.17 97.63 98.08 97.90 EWEA TON_IoT
[72] 99.74 100 99 100 CNN, VGG19 CIC-DDoS2019
detection_of_IoT_botnet-
[40] 99.7 99.7 99.7 99.7 RF
_attacks_N_BaIoT
[36] 99.7 98.3 - 85.9 CNN IoTID20
Hybrid Deep Learning (LSTM + CIC DoS, CI-CIDS 2017,
[52] 99.51 99.6 99.52 99.51
DENS + GRU) CSE-CIC-IDS 2018
[57] 99.45 - - 99.52 RF UNSW-NB15, IoT23, TON_IoT
[53] 99.1 98.4 99.1 98.7 XGBoost TON_IoT
[28] 99 100 99 99.16 EWMA+CUSUM+KNN Bot-IoT
[63] 99 99 100 99 RF Bot-IoT
[58] 99 100 100 100 SVM TON_IoT
[64] 98.78 99.03 99.35 98.48 CNN + LSTM CISIDS2019
[46] 98.69 99 - 99 XGBoost CiCIoT2023
[77] 98.6 - 98.7 98 RF IoT-CIDDS
[86] 98.5 - - - SVM + Bagged Trees generated dataset
[68] 98.5 98.5 98.4 98.44 TabNet algorithm CIC-DDoS2019
[32] 98.2 - - - NB custom
[37] 98 100 98 99 SVM custom
[49] 97.39 - - - DT + RF + SVM CICDDoS2019
Stacking (DT, MLP RProp, Logistic
[75] 97.31 95.8 96.88 96.39 TON_IoT
Regression)
[29] 97.16 97.41 99.1 - CNN + LSTM CICIDS2017
[50] 97.06 97.31 97.02 97 RF MedBIoT
variational dynamic Bayesian
[54] 97 98 - 95.9 DARPA2000
algorithm+HMM
[51] 95.32 95.8 95.32 94.81 RF MedBIoT
[38] 95 95 95 94 GADAD-ET TON_IoT
K-means + Gaussian mixture +
[65] 94.5 93.3 95.3 94.3 CIC-DDoS2019
one-class SVM
[66] 88.73 89.92 88.73 88.53 APSO-CNN-SE UNSW-NB15

Table 6. Multiclass classification performance (sorted by accuracy).

Paper Accuracy [%] Precision [%] Recall [%] F1 Score [%] Highest Accuracy ML Technique Dataset
[38] 100 100 - 100 GADAD-RF TON_IoT
[55] 100 100 100 100 RF TON_IoT
BoostedEnsML (LightGBM +
[69] 100 100 100 100 CSE-CICIDS2018
XGBoost)
[45] 99.985 99.996 - 99.997 RF BoT-IoT
Hybrid Deep Learning (LSTM +
[52] 99.97 99.98 99.97 99.98 BoT-IoT
DENS + GRU)
EnsembleVoting (RF DT ET CIC DoS, CI-CIDS 2017,
[76] 99.95 - - -
XGBoost) CSE-CIC-IDS 2018
[82] 99.95 99.95 99.95 99.95 RF BoT-IoT
[61] 99.9 99.9 99.9 99.9 DT TON_IoT
[67] 99.9 99.9 99.9 99.9 RF BoT-IoT
[81] 99.89 99.95 99.92 99.94 KNN IoT23
[30] 99.8 - - - RF BoT-IoT
[79] 99.7 - - - DP-model CIC-IDS2017
Algorithms 2025, 18, 209 21 of 34

Table 6. Cont.

Paper Accuracy [%] Precision [%] Recall [%] F1 Score [%] Highest Accuracy ML Technique Dataset
[62] 99.6 - - - MLP algorithm Bot-IoT
[63] 99 99 99 99 KNN Bot-IoT
[58] 99 99 99 99 LSTM TON_IoT
[39] 98.9 99.47 99.31 99.35 LSTM CIC-DDoS2019
[35] 98.72 97.81 97.35 98.26 EWEA Edge-IIoTset
[53] 98.5 98.2 95.9 97.4 KNN TON_IoT
[34] 98.1 - - - DT custom
[57] 97.81 - - 97.81 RF UNSW-NB15, IoT23, TON_IoT
[51] 97.66 98.24 97.66 96.57 RF MedBIoT
Voting (DT, MLP RProp, Logistic
[75] 96.32 93.12 84.55 88.63 TON_IoT
Regression)
[50] 96.17 96.92 96.17 96.02 RF MedBIoT
[70] 95.59 79 64 68 Extratree UNSW_NB15
[41] 94.21 - - - J48 Edge-IIoTset
[36] 86.1 - - 75.8 CCN IoTID20
[66] 78.35 81.79 78.35 77.65 APSO-CNN-SE UNSW-NB15

There were significant performance variations between various ML models. For binary
classification, RF, DT, and XGBoost were the overall best performers among other models.
On the contrary, for multiclass classification, the RF, K-Nearest Neighbors (KNN), and Long
Short Term Memory (LSTM) models performed better, suggesting that the flexibility and
robustness of the above methods were more suitable to deal with the increased complexity
of the multiclass issue. These findings verify the relevance of algorithmic selection to
maximize outcomes.
While the majority of studies emphasize accuracy as the primary measure, it is class
distribution-prone and can be misleading with high class imbalances in the datasets. In
response to these limitations, numerous studies have placed emphasis on measures such as
F1 score, which provides a balanced perspective with regard to both recall and precision.
This shift emphasizes the need for standard reporting practices since they enable fair
and meaningful comparisons among studies and permit the building of robust DDoS
detection methodologies.
It is hard to compare the time complexity of different machine learning techniques
across various datasets due to differences in the implementation, hardware, and nature of
the datasets. Some of the papers provided us with ADT (Average Detection Time) but most
did not. To address this, we chose to calculate the Average Validation Time (AVT)—the
time it took to run models on test datasets, grouped by particular datasets. Although AVT
is a useful rough estimate of model efficiency, it is by no means flawless. It does not account
for preprocessing steps or the specific feature sets used, which can prove to be of significant
impact. However, we believe that AVT is a handy metric for gaining insight into the relative
computational expense of different models on provided datasets.

6.2. Binary Classification


6.2.1. Observations
Tables 7–9 indicate that most of the studies achieved an accuracy rate higher than
90%, with a few achieving near-perfect accuracy. More advanced preprocessing techniques
like dimensionality reduction and feature engineering often resulted in improved perfor-
mance. DT, RF, and ensemble algorithms like XGBoost repeatedly ranked high in the
best-performing category.
In the TON_IoT dataset, DT and RF achieved perfect accuracy, benefiting from em-
ployed preprocessing in [55,61]. Similarly, XGBoost demonstrated exceptional accuracy
(99.9987%) on the BoT-IoT dataset, underscoring its robustness [45]. This can also be at-
tributed to carefully designed features and a controlled attacking setting. High performance
Algorithms 2025, 18, 209 22 of 34

often reflects an ideal scenario where distinguishing features are quite clear. However,
the controlled nature of the dataset might not capture the full complexity found in real-
world environments.

Table 7. Binary classification performance for BoT-IoT (sorted by accuracy); AVT: Average Valida-
tion Time.

Paper Accuracy [%] Precision [%] Recall [%] F1 Score [%] AVT [ms] Highest Accuracy ML Technique
[45] 99.9987 100 - 99.9993 0.0011 XGBoost
[78] 99.99 - - - - JRip classifier
[82] 99.96 99.96 99.96 99.96 - RF
[76] 99.92 - - - - Stacking using XGBoost
[67] 99.9 99.9 99.9 99.9 0.0011 RF
[40] 99.7 99.7 99.7 99.7 - RF
[28] 99 100 99 99.16 - EWMA + CUSUM + KNN
[63] 99 99 100 99 - RF

Table 8. Binary classification performance for TON_IoT (sorted by accuracy); AVT: Average Validation
Time, ADT: Average Detection Time.

Accuracy Precision F1 Score


Paper Recall [%] AVT [ms] ADT [ms] Highest Accuracy ML Technique
[%] [%] [%]
[61] 100 100 100 100 - 130 DT
[55] 100 100 100 100 0.3173 - RF
[35] 98.17 97.63 98.08 97.90 - 14.22 EWEA
[53] 99.1 98.4 99.1 98.7 - - XGBoost
[58] 99 100 100 100 - - SVM
[75] 97.31 95.8 96.88 96.39 - - Stacking (DT, MLP RProp, Logistic Regression)
[38] 95 95 95 94 0.0074 - GADAD-ET

Table 9. Binary classification performance for CIC-DDoS2019 (sorted by accuracy); AVT: Average
Validation Time.

Paper Accuracy [%] Precision [%] Recall [%] F1 Score [%] AVT [ms] Highest Accuracy ML Technique
[72] 99.74 100 99 100 37.02 CNN, VGG19
[64] 98.78 99.03 99.35 98.48 - CNN + LSTM
[68] 98.5 98.5 98.4 98.44 0.1234 TabNet algorithm
[49] 97.39 - - - - DT + RF + SVM
[65] 94.5 93.3 95.3 94.3 - K-means + Gaussian mixture+one-class SVM

The CIC-DDoS2019 dataset showed the worst accuracies and the widest performance
spread compared to the other two datasets. This variation could correspond to the complex-
ity of the DDoS attack patterns that are present in the dataset. Deep learning techniques that
include CNNs may underline the potential complexity of the dataset, where performance
could be improved when spatial or sequential representations of data are the basis for
such methodologies.
The general tendency in the performance of the studied models indicates that mod-
els based on neural networks require more time and impose higher computational over-
head [61], while XGBoost, DT, RF, and ET are faster. However, in the CIC-DDoS2019 dataset,
CNN-based models outperformed tree-based and SVM-based alternatives (Table 9).

6.2.2. Challenges in Comparison


Many studies prioritize accuracy, neglecting other key metrics like recall, precision,
or F1 score, masking the impact of class imbalance in the datasets. The choice of dataset
significantly affects model performance due to variations in traffic volume and class distri-
bution. Hyperparameter tuning and dataset-specific optimizations further limit the validity
of direct comparisons.
Algorithms 2025, 18, 209 23 of 34

6.2.3. Trends and Insights, Tables 7–9


Ensemble models such as RF and XGBoost are usually the preferred choices in binary
classification problems since they have the capability of efficiently picking up complex
patterns. Custom preprocessing methods, as seen by studies [61,72], significantly enhance
performance, highlighting the importance of dataset-specific tweaks.

6.3. Multiclass Classification


6.3.1. Observations
Similar to binary classification, many studies also achieved high accuracy in multiclass
classification. Models based on the RF technique dominate in the BoT-Iot dataset. All of the
studies using this dataset for multiclass classification exceeded 99%.
In the case of the TON_IoT dataset, the techniques used are more diverse. RF (includ-
ing GADAD-RF) achieved perfect scores of 100% [38,55]. Decision Trees and LSTM were
just behind the RFs, with scores of 99.9 [61] and 99 [58], respectively. However, those using
KNN and voting performed worse (F1 scores: 97.4 [53] and 88.63 [75]). These results imply
that the more complex the models used, the worse the predictions.
Unfortunately, the studies covering NN models did not provide testing time; thus,
we were unable to benchmark them against tree-based models for multiclass classifiers.
However, Table 10 shows that [52] performed ≈45 and ≈19 times slower than the two
tree-based equivalents [67].

Table 10. Multiclass classification performance for BoT-IoT (sorted by accuracy); AVT: Average
Validation Time, ADT: Average Detection Time.

Accuracy Precision Recall F1 Score AVT ADT


Paper Highest Accuracy ML Technique
[%] [%] [%] [%] [ms] [ms]
[45] 99.985 99.996 - 99.997 0.0026 - RF
[52] 99.97 99.98 99.97 99.98 0.05 - Hybrid Deep Learning (LSTM + DENS + GRU)
[82] 99.95 99.95 99.95 99.95 - - RF
[67] 99.9 99.9 99.9 99.9 0.0011 - RF
[30] 99.8 - - - - 36.9 RF
[62] 99.6 - - - - - MLP algorithm
[63] 99 99 99 99 - - KNN

6.3.2. Challenges in Comparison


Incomplete reporting of metrics, such as missing F1 scores, limits the reliability of
comparisons. Skewed class distributions in multiclass datasets inflate accuracy but obscure
the performance of minority classes.

6.3.3. Trends and Insights, Tables 10 and 11


While RF dominates multiclass tasks, models based on neural networks such as
LSTM [52,58] demonstrate growing potential, particularly for sequential data. Tailored
preprocessing and feature engineering are critical, as evidenced by studies [38,61]. Standard-
ized reporting practices remain essential for ensuring fair comparisons and reproducibility.
Algorithms 2025, 18, 209 24 of 34

Table 11. Multiclass classification performance for TON_IoT (sorted by accuracy); AVT: Average
Validation Time, ADT: Average Detection Time.

Accuracy Precision Recall F1 Score AVT ADT


Paper Highest Accuracy ML Technique
[%] [%] [%] [%] [ms] [ms]
[38] 100 100 - 100 0.022 - GADAD-RF
[55] 100 100 100 100 0.795 - RF
[61] 99.9 99.9 99.9 99.9 - 170 DT
[58] 99 99 99 99 - - LSTM
[53] 98.5 98.2 95.9 97.4 - - KNN
[75] 96.32 93.12 84.55 88.63 - - Voting (DT, MLP RProp, Logistic Regression)

7. Discussion
7.1. Datasets
The reviewed studies reveal a strong preference for the BoT-IoT and TON_IoT datasets
due to their suitability for both binary and multiclass classification tasks. Both datasets
offer rich, labeled data and are gathered from real IoT/IIoT environments and are therefore
highly relevant for cybersecurity research. However, their significant class imbalances pose
challenges that require additional preprocessing or algorithmic strategies to ensure balanced
performance across classes. The BoT-IoT dataset, with its smaller number of attack classes,
is particularly popular, while the more diverse TON_IoT dataset provides opportunities to
evaluate model performance across a wider range of scenarios. Comparatively, datasets
like CIC-IDS2017 are also used, but their specificity to certain attack types and features
makes them less applicable in general for all IoT usage. The distribution of datasets among
the reviewed papers is summarized in Figure 1.
There is no significant difference in the popularity of datasets used for binary and
multiclass classification tasks. BoT-IoT remains the most popular dataset, followed very
closely by TON_IoT, because both datasets are eligible for these two purposes. Binary
classification—labeling data flows as benign or attack—appears to be tackled more often
(42 studies) than multiclass classification, which focuses on distinguishing between different
attack types (27 studies). The distribution of datasets employed for binary and multi-
classification comparisons is presented in Figure 3 and Figure 4, respectively.

Figure 3. Distribution of datasets used for comparison of binary classification.


Algorithms 2025, 18, 209 25 of 34

Figure 4. Distribution of datasets used for comparison of multi-classification.

7.2. Machine Learning Techniques


In binary classification tasks, the Random Forest (RF) and Decision Tree (DT) algo-
rithms perform exceptionally well, along with XGBoost. However, for multiclass clas-
sification, XGBoost is notably absent as a standalone model. Simple tree-based mod-
els appear to outperform others, likely because of their robustness against overfitting.
Figure 5 and Figure 6 present the distribution of ML techniques employed for binary and
multiclass classification respectively.

Figure 5. Distribution of ML techniques used for comparison of binary classification.

Figure 6. Distribution of ML techniques used for comparison of multiclass classification.


Algorithms 2025, 18, 209 26 of 34

7.3. Resource Usage


The hardware used varies from basic laptops with low-end configurations to high-
performance computing (HPC) facilities and cloud setups, as can be seen in Table A1.
For instance, the majority of the studies used personal computers or laptops with Intel
Core i5 or i7 processors and 8–32 GB of RAM. While these setups are sufficient for small
datasets and general machine learning models, they struggle to handle large datasets or
deep, computationally intensive models. Alternatively, highly resource-consuming systems
such as HPC clusters or systems with components with GPUs such as NVIDIA Tesla or
RTX4070Ti will be in a position to offer researchers an accelerated method of dealing with
big datasets along with computer-resource-intensive models such as CNNs and LSTMs.
One of the most noticeable trends is the dominance of Windows operating systems
in most experiments, with some experiments being conducted using Ubuntu or other
Linux operating systems in instances where cloud or server-based setups were used.
Programming environments are largely comprised of Python-based libraries (e.g., Scikit-
learn and TensorFlow), indicating the dominance of Python as an ML research standard
language. Other approaches, such as using Rust in the interest of efficiency, were reported
but less common.
The choice of infrastructure can significantly impact the scalability, efficiency, and
overall feasibility of experiments. HPC or cloud-based experiments benefit from minimized
computation time and task parallelization. This is particularly evident in experiments based
on deep learning frameworks or utilizing large datasets such as BoT-IoT and TON_IoT.
However, personal computer or laptop experiments rely on straightforward models such
as Decision Trees or Random Forests due to hardware limits.

7.4. Performance
Machine learning models have consistently demonstrated superb performance, with
median accuracy levels of approximately 99% on binary and multiclass tasks. This per-
formance is best illustrated using the BoT-IoT dataset, most likely due to it having fewer
classes and properly designed features, which make it highly trainable. Decision Tree
(DT) and Random Forest (RF) models tend to perform well, particularly in binary clas-
sification, where overfitting resistance is an asset. In multiclass scenarios, tree-based
ensemble techniques are also used very effectively, although their application differs across
datasets. However, the inconsistencies observed with datasets such as TON_IoT suggest
that dataset characteristics, including class distribution and feature diversity, significantly
impact model performance. Cross-validation across multiple datasets remains crucial to
ensuring robustness and the ability to adapt to different conditions.

7.5. Outlook
As this review has indicated, researchers suffer from a shortage of suitable data to base
their work on in this domain. Regarding datasets, researchers will continue to fill in the
gaps in industry-specific datasets as IoT devices continue to gain popularity. That is, as was
the case with the IIoT datasets, [83,87], and MedBIoT dataset [51] published in the past two
years, the research will be looking to target more specialized devices and network traffic,
rather than trying to encompass large general datasets. On the same note, more general
datasets will continue to be published, like the ones from the CIC, though they will focus
more on the common attacks or vulnerabilities identified in the most recent years. There is
no indication that the more recent datasets are gaining in complexity (neither in the number
of records, features, or devices captured); hence, this trend is likely to continue. This is
tied to the resource-constrained nature of research institutes’ current training, testing, and
deployment capabilities.
Algorithms 2025, 18, 209 27 of 34

In terms of infrastructure and deployment, SDNs represent a paradigm shift in the


management of IoT networks, with the flexibility and scalability required to tame the
complexity of IoT ecosystems today, and are well positioned to play a central role in facili-
tating effortless deployment and adaptive security frameworks. We can anticipate more
utilization of blockchain and edge computing paradigms in the years ahead, particularly as
IoT systems grow in size and complexity. By compromising on infrastructure issues with
robust data-centric approaches, researchers can continue to advance the field of intrusion
detection systems.
The fields of ML and DL are continuously evolving. Our study examined the top ML
models for this particular task, but forecasting which methods will be applied in the future
is still difficult. Nevertheless, it is clear that, aside from the selection of ML techniques,
preprocessing of data and feature selection are important factors and will continue to
improve. To simplify future research, using standardized datasets and evaluation metrics
is highly recommended. Also, in terms of the IDS, it is important that the system is able to
provide real-time detection, so one possible future research area is to include this in ML
models in a standardized way. The development of ML algorithms could also mean easier
deployment even in more constricted infrastructure environment conditions.

8. Future Research Direction


This systematic literature review highlights some gaps in the research and in the
direction of improving machine learning-based intrusion detection systems (IDSs) for
DDoS attacks in IoT networks. Filling these will result in developing improved and
effective security solutions for IoT systems.

8.1. Enhancing Dataset Diversity and Realism


This review emphasizes a strong reliance on benchmark datasets such as BoT-IoT and
TON_IoT, which may not be sufficient to capture the diversity and variability of real-world
IoT traffic. This focus should be directed in future towards developing and distributing
large-scale actual-world datasets that may mitigate the issue of heterogeneous IoT devices
and evolving attack patterns. Furthermore, synthetic data generation methods such as
Generative Adversarial Networks (GANs) could be used to simulate rare attack patterns
and improve dataset diversity. The above should enhance the capability of ML models to
generalize across multiple datasets rather than fine-tuning against a single dataset. Future
research could also find ways to adapt models so they work well with different datasets.

8.2. Advanced Preprocessing and Feature Engineering


Future research should aim to identify optimal feature subsets that maximize detection
accuracy while reducing computational complexity and investigate streaming data prepro-
cessing techniques that enable real-time DDoS detection in dynamic IoT environments.

8.3. Advancements in Machine Learning Models


Future machine learning research for intrusion detection systems (IDSs) will have to
prioritize performance and explainability. Though classic models like Decision Trees (DTs)
and Random Forest (RF) demonstrated outstanding accuracy, more advanced techniques
can further improve detection. Researchers need to explore ways that explain models in
a better manner, providing clear insight into their decision-making to increase trust and
usability in practical applications. In addition, future work should investigate federated
learning as a privacy-preserving alternative such that IDS models can be learned on dis-
tributed IoT nodes without centralizing sensitive data. For better generalization, domain
adaptation techniques must also be brought in so that models operate reasonably well on
various datasets and real-world environments.
Algorithms 2025, 18, 209 28 of 34

8.4. Resource-Efficient and Lightweight ML Models


Since IoT devices have limited computing power, it is crucial to design lightweight
and energy-efficient IDSs. This involves investigating the effects of reducing computational
overhead on detection accuracy and response time for actual deployment in the field.
Model compression, pruning, quantization, and knowledge distillation are some of the
methods that can be applied to make models simpler without compromising their threat
detection abilities.

8.5. Scalable and Secure Deployment Strategies


The trade-offs between cloud-based and edge-based IDS architectures, considering
latency, security, and scalability, would be a worthwhile area for future research. The
benefits of blockchain technology are that it can successfully offer secure and tamper-
proof logging of network traffic data for distributed and transparent attack detection if
implemented in a computationally efficient way.

9. Conclusions
This study explored more than 60 relevant papers, with the aim of providing a
one-of-a-kind replicable comprehensive PRISMA literature review on machine learning
techniques, deployment options, and datasets in recent papers on IDSs for DDoS attacks in
IoT networks. As one of the starting points, this study laid out the foundational knowledge
about the domain, including a summary of findings from previous literature reviews on
the topic. The next chapters explored details of IDS deployment, providing a clear view
of the many options researchers have used in their studies and highlighting issues with
resource-heavy infrastructure choices. Similarly, the datasets used were listed, with the
most popular ones described in full detail, aiding in the analysis of the performance results
of different machine learning algorithms. For the machine learning chapter, both binary
and multiclass classification were analyzed, and a comparative study on the performance
of the algorithms was conducted.
This literature review discussed the findings, common trends, and important insights
in Section 7, which provides the reader with a comprehensive overview of this paper’s
answers to the posed research questions. This chapter ends with an outlook on the expected
trends in this research domain, which could be used by future researchers to conduct
further studies and fill gaps in the current knowledge.

Author Contributions: Conceptualization, M.B.B., S.D., M.K., M.B.L., M.P., G.C. and N.D.; method-
ology, M.B.B., S.D., M.K., M.B.L., M.P., G.C. and N.D.; validation, M.B.B., S.D., M.K. and M.B.L.;
investigation, M.B.B., S.D., M.K. and M.B.L.; resources, M.B.B., S.D., M.K. and M.B.L.; data curation,
M.B.B., S.D., M.K. and M.B.L.; writing—original draft preparation, M.B.B., S.D., M.K., M.B.L., M.P.,
G.C. and N.D.; writing—review and editing, M.P., G.C. and N.D.; visualization, M.B.B., S.D., M.K.,
M.B.L., M.P., G.C. and N.D.; supervision, M.P., G.C. and N.D.; project administration, M.P., G.C.
and N.D.; funding acquisition, N.D. All authors have read and agreed to the published version of
the manuscript.

Funding: This research received no external funding.

Conflicts of Interest: The authors declare no conflicts of interest.


Algorithms 2025, 18, 209 29 of 34

Appendix A
Table A1. Studies and their ML training systems.

Paper ML Training HW + SW
HP notebook with Windows 10 Pro Enterprise 64-bit, an Intel(R) Core(TM) i7-5500 CPU with two cores and four logical
[61]
processors, 16 GB of RAM, and 14.6 GB of virtual memory. Used the PyCharm 2022.2 and Python 3.10
Intel(R) Core (TM) i7-9750H CPU @ 2.60 GHz, 2592 Mhz, six core(s), twelve logical processor(s), 16 GB RAM, and NVIDIA
[55]
GeForce GTX 1660 Ti with Max-Q Design 4 GB
[62] HP computer with CPU (2.50GHz Intel(R) Core (TM) i7-6500U), 8 GB RAM, and Windows10
[28] Windows 10, 16 GB RAM Intel® Core ™ i7-8650U CPU at 2.11 GHz, Jupyter Notebook Scikit-learn ML
[63] No specs, Python
[64] Keras/Tensorflow on NVIDIA Tesla VIOO GPUs, 16 GB VRAM with 256 GB on 10 nodes in HPC
[65] On High-Performance Computer (HPC) facility available at the University of Huddersfield, UK
[42] No specs
[66] Windows 10, Intel(R) Core(TM) i7-10700K CPU Main frequency 3.80 GHz, RAM 32 GB, Python3.8 torch1.8.0
[67] DELL (inspiron13 5000) Laptop, Windows 10, Intel(R)Core(TM)i5-8250U CPU @ 1.60GHz,1.80 GHz.8.00 GB RAM
[68] No specs
Intel(R) Core(TM) i7-7700 CPU @ 3.60 GHz, 3600 Mhz, 4 Core(s), 16 GB (15.9 GB usable), Windows 10, NVIDIA GeForce
[69]
GTX 1050 Ti GPU.
[71] Intel Core i7 processor (3.6GHz Quad-core), 1TB of hard disk storage, 32GB of RAM, Windows 11, Python v3.6
[72] HP laptop with a 2.9 GHz Intel Core i7-7500U CPU, 8GB RAM, Python
[73] No specs
64-bit Intel Core-i7 CPU with 16 GB RAM in Windows 7. Tensorflow for deep learning, machine learning algorithm as
[29]
implemented in MATLAB 2017a
[74] No specs, Rust program
[59] 64-bit ARM CPU, Sci-kit-learn using Python 3.9
[75] 8th gen Intel Core i7 CPU, 32 GB RAM, NVIDIA Quadro M2000M GPU
[56] No specs
[76] Google Colab
[57] Local desktop computer (64-bit, 16 GB RAM, Core I7), Java (JDK) 11, Hadoop 2.7, Spark v3.0, and Pyspark 3.0
Waikato Information Research Environment (Weka 3.8.3), 32 GB RAM workstation with Intel Xeon CPU E3-1271 v3 @ 3.60
[78]
GHz CPU, Scikit-learn library in Python
12 processors, six kernels, RAM 32 GB 64-b Windows 10. Each processor is configured with Intel Core i7-8750H CPU
[39]
@2.20GHz, 2201 MHz
[60] No specs
[79] No specs
[30] Intel Xeon E5-2650v4 2xCPU, 256 GB RAM, CentOS Linux 7, Python 3.9.6
[80] 64-bit Windows 10
[81] No specs
[51] No specs
[82] MacBook Pro, Apple M1 Chip, 16 GB RAM

Table A2. Studies and their ML training systems continuation.

Paper ML Training HW + SW
[86] Asus notebook, Kali Linux 2020.4, 8GB of primary memory and core i5 CPU
[43] No specs
PHP 5.3.13 on Intel Core I7 CPU at 2.40 GHz, 2 GB RAM, Windows 10. Apache Server version 2.2.22 to implement servers,
[40]
MYSQL version 5.5.24
[77] No specs
[41] No specs
[45] Tyrone PC run by Intel(R) Xeon(R) Silver 4114 CPU @ 2.20 GHz (2 processors), 128 GB RAM and 2 TB hard disk.
[46] Windows, CPU model i7-13700K @ 3.40 GHz with 32 GB memory, GPU NVIDIA GeForce® RTX4070Ti with 12 GB memory
[47] No specs
[48] Simulated using Python 3.6.5, on PC i5-8600k, 250GBSSD, GeForce 1050Ti 4GB, 16GB RAM, and 1TBHDD
Algorithms 2025, 18, 209 30 of 34

Table A2. Cont.

Paper ML Training HW + SW
[52] Intel Core i5 8th generation laptop, Python 3.0 simulation
[53] Python 3.8, Windows 10 Core i7, 16 GB RAM.
[49] Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 16 GB RAM Ubuntu 18.04 LTS
[58] No specs
[50] No specs
[38] Python 3.9, Windows 10, Intel Core i7 CPU, 16 GB RAM
Austrian Centre for Cyber Security’s (ACCS) Cyber Range Lab employs the IXIA Perfect Storm technology to create a
[70]
hybrid model of everyday activities
[32] No specs
[10] Ubuntu server 20.04 LTS virtual machine on Intel core i51135G7 processor, 12 GB of RAM and a Microsoft Windows 10 host
[36] Ubuntu on Raspberry Pi 3
[31] No specs
[33] No specs
[35] Intel Core i7-1355U CPU, 12 GB RAM, a virtual machine operating Ubuntu 20.04.6 LTS
[37] No specs
[34] Ubuntu v16.0.2, Intel® Core™ i7-3540M 3.00 GHz CPU, 4.0 GB RAM.
[44] No specs, simulation
Built-at-home model, ARRIS TM822A modem, a NETGEAR R6300v2 distant switch, a TPLink AC1750 twofold far-off
[54]
switch, a NETGEAR ProSAFE Plus GS105Ev2 switch

References
1. Sinha, S. State of IoT 2024: Number of Connected IoT Devices Growing 13 18.8 Billion Globally. 2024. Available online:
https://iot-analytics.com/reports-databases/ (accessed on 30 November 2024).
2. Marr, B. 2024 IoT and Smart Device Trends: What You Need to Know for the Future. Forbes, 19 October 2023.
3. Executive, P. Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) Website. Available online:
https://www.prisma-statement.org/ (accessed on 30 November 2024).
4. Executive, P. Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) Flow Diagram. Available online:
https://www.prisma-statement.org/prisma-2020-flow-diagram (accessed on 30 November 2024).
5. Executive, P. Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) Checklist. Available online:
https://www.prisma-statement.org/prisma-2020-checklist (accessed on 30 November 2024).
6. DTU FindIt Database. Available online: https://findit.dtu.dk/ (accessed on 30 November 2024).
7. DTU FindIt Data Providers. Available online: https://findit.dtu.dk/en/about/providers/ (accessed on 30 November 2024).
8. Desbiens, F. What Is IoT? In Building Enterprise IoT Solutions with Eclipse IoT Technologies; Apress: New York, NY, USA, 2023;
pp. 3–23. [CrossRef]
9. Tawalbeh, L.; Muheidat, F.; Tawalbeh, M.; Quwaider, M. IoT Privacy and Security: Challenges and Solutions. Appl. Sci. 2020,
10, 4102. [CrossRef]
10. Khedr, W.I.; Gouda, A.E.; Mohamed, E.R. FMDADM: A Multi-Layer DDoS Attack Detection and Mitigation Framework Using
Machine Learning for Stateful SDN-Based IoT Networks. IEEE Access 2023, 11, 28934–28954. [CrossRef]
11. Mothukuri, V.; Khare, P.; Parizi, R.M.; Pouriyeh, S.; Dehghantanha, A.; Srivastava, G. Federated-Learning-Based Anomaly
Detection for IoT Security Attacks. IEEE Internet Things J. 2022, 9, 2545–2554. [CrossRef]
12. IEEE Xplore. IEEE Xplore Digital Library. 2024. Available online: https://ieeexplore.ieee.org/Xplore/home.jsp (accessed on 30
November 2024).
13. Dulhare, U.N.; Ahmad, K.; Ahmad, K.A.B. What is Machine Learning? In Machine Learning and Big Data: Concepts, Algorithms,
Tools, and Applications; John Wiley & Sons: Hoboken, NJ, USA, 2020.
14. Zhang, J.; Li, F.; Ye, F. An ensemble-based network intrusion detection scheme with bayesian deep learning. In Proceedings of
the ICC 2020—2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 7–11 June 2020; pp. 1–6.
15. Duke, D. What is the difference between Denial-of-Service (DoS) and Distributed-Denial-of-Service (DDoS)? Netw. Secur. 2002,
2002, 4. [CrossRef]
16. Nazario, J. DDoS attack evolution. Netw. Secur. 2008, 2008, 7–10. [CrossRef]
17. Lee, N. Intrusion Detection System. In Encyclopedia of Computer Graphics and Games; Springer International Publishing: Cham,
Switzerland, 2024; p. 1008. [CrossRef]
Algorithms 2025, 18, 209 31 of 34

18. Dutta, N.; Jadav, N.; Tanwar, S.; Sarma, H.K.D.; Pricop, E. Intrusion Detection Systems Fundamentals. In Cyber Security: Issues and
Current Trends; Springer: Singapore, 2022; pp. 101–127. [CrossRef]
19. Rafique, S.H.; Abdallah, A.; Musa, N.S.; Murugan, T. Machine Learning and Deep Learning Techniques for Internet of Things
Network Anomaly Detection—Current Research Trends. Sensors 2024, 24, 1968. [CrossRef] [PubMed]
20. Mishra, N.; Pandya, S. Internet of Things Applications, Security Challenges, Attacks, Intrusion Detection, and Future Visions:
A Systematic Review. IEEE Access 2021, 9, 59353–59377. [CrossRef]
21. Zaman, S.; Tauqeer, H.; Ahmad, W.; Shah, S.M.A.; Ilyas, M. Implementation of Intrusion Detection System in the Internet of
Things: A Survey. In Proceedings of the 2020 IEEE 23rd International Multitopic Conference (INMIC), Bahawalpur, Pakistan,
5–7 November 2020; pp. 1–6. [CrossRef]
22. Abinaya, M.; Prabakeran, S.; Kalpana, M. Comparative Evaluation on Various Machine Learning Strategies Based on Identification
of DDoS Attacks in IoT Environment. In Proceedings of the 2023 9th International Conference on Advanced Computing and
Communication Systems (ICACCS), Coimbatore, India, 17–18 March 2023; Volume 1, pp. 1814–1821. [CrossRef]
23. Vivek, V.; Veeravalli, B. A Survey on Machine Learning Approaches for Intrusion Detection in Cloud Computing Environments
for Improving Routing Payload Security and Network Privacy. In Proceedings of the 2024 IEEE International Conference on
Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT), Bali, Indonesia, 4–6 July 2024; pp. 79–85. [CrossRef]
24. Walling, S.; Lodh, S. A Survey on Intrusion Detection Systems: Types, Datasets, Machine Learning methods for NIDS and Chal-
lenges. In Proceedings of the 2022 13th International Conference on Computing Communication and Networking Technologies
(ICCCNT), Kharagpur, India, 3–5 October 2022; pp. 1–7. [CrossRef]
25. Moustafa, N. The TON_IoT Datasets. 2021. Available online: https://research.unsw.edu.au/projects/toniot-datasets
(accessed on 30 November 2024).
26. Koroniotis, N.; Moustafa, N.; Sitnikova, E.; Turnbull, B. Towards the development of realistic botnet dataset in the Internet of
Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst. 2019, 100, 779–796. [CrossRef]
27. Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. CICIDS2017 Dataset. 2018. Available online: https://www.unb.ca/cic/datasets/
ids-2017.html (accessed on 30 November 2024).
28. Alzahrani, R.; Alzahrani, A. A Novel Multi Algorithm Approach to Identify Network Anomalies in the IoT Using Fog Computing
and a Model to Distinguish between IoT and Non-IoT Devices. J. Sens. Actuator Netw. 2023, 12, 19. [CrossRef]
29. Roopak, M.; Yun Tian, G.; Chambers, J. Deep Learning Models for Cyber Security in IoT Networks. In Proceedings of the 2019
IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 7–9 January 2019;
pp. 452–457. [CrossRef]
30. Katsura, Y.; Endo, A.; Kakiuchi, M.; Arai, I.; Fujikawa, K. Lightweight Intrusion Detection Using Multiple Entropies of Traffic
Behavior in IoT Networks. In Proceedings of the 2022 IEEE Global Conference on Artificial Intelligence and Internet of Things
(GCAIoT), Alamein New City, Egypt, 18–21 December 2022; pp. 138–145. [CrossRef]
31. Kumar, J.; Arul Leena Rose, P.J. Mitigate Volumetric DDoS Attack using Machine Learning Algorithm in SDN based IoT Network
Environment. Int. J. Adv. Comput. Sci. Appl. 2023, 14. [CrossRef]
32. Jyothsna, V. Defending Against IoT Threats: A Comprehensive Framework with Advanced Models and Real-Time Threat
Intelligence for DDoS Detection. In Proceedings of the 2024 2nd International Conference on Networking and Communications
(ICNWC), Chennai, India, 2–4 April 2024; pp. 1–7. [CrossRef]
33. Ferrão, T.; Manene, F.; Ajibesin, A.A. Multi-Attack Intrusion Detection System for Software-Defined Internet of Things Network.
Comput. Mater. Contin. 2023, 75, 4985–5007. [CrossRef]
34. Bhayo, J.; Shah, S.A.; Hameed, S.; Ahmed, A.; Nasir, J.; Draheim, D. Towards a machine learning-based framework for DDOS
attack detection in software-defined IoT (SD-IoT) networks. Eng. Appl. Artif. Intell. 2023, 123, 106432. [CrossRef]
35. Toony, A.A.; Alqahtani, F.; Alginahi, Y.; Said, W. MULTI-BLOCK: A novel ML-based intrusion detection framework for
SDN-enabled IoT networks using new pyramidal structure. Internet Things 2024, 26, 101231. [CrossRef]
36. Tawfik, M.; Al-Zidi, N.M.; Alsellami, B.; Al-Hejri, A.M.; Nimbhore, S. Internet of Things-Based Middleware Against Cyber-
Attacks on Smart Homes using Software-Defined Networking and Deep Learning. In Proceedings of the 2021 2nd International
Conference on Computational Methods in Science & Technology (ICCMST), Mohali, India, 17–18 December 2021; pp. 7–13.
[CrossRef]
37. Mazhar, N.; Saleh, R.; Zaba, R.; Zeeshan, M.; Hameed, M.M.; Khan, N. R-IDPS: Real Time SDN-Based IDPS System for IoT
Security. Comput. Mater. Contin. 2022, 73, 3099–3118. [CrossRef]
38. Saiyed, M.; Al Anbagi, I. A Genetic Algorithm- and t-Test-based system for DDoS Attack Detection in IoT Networks. IEEE Access
2024, 12, 25623–25641. [CrossRef]
39. Jia, Y.; Zhong, F.; Alrawais, A.; Gong, B.; Cheng, X. FlowGuard: An Intelligent Edge Defense Mechanism Against IoT DDoS
Attacks. IEEE Internet Things J. 2020, 7, 9552–9562. [CrossRef]
Algorithms 2025, 18, 209 32 of 34

40. Hikal, N.A.; Elgayar, M.M. Enhancing IoT Botnets Attack Detection Using Machine Learning-IDS and Ensemble Data Preprocess-
ing Technique. In Proceedings of the Internet of Things—Applications and Future, Agartala, Tripura, India, 3–4 February 2020;
Ghalwash, A.Z., El Khameesy, N., Magdi, D.A., Joshi, A., Eds.; Springer: Singapore, 2020; pp. 89–102.
41. Haque, S.; El-Moussa, F.; Komninos, N.; Muttukrishnan, R. Identification of Important Features at Different IoT layers for
Dynamic Attack Detection. In Proceedings of the 2023 IEEE 9th Intl Conference on Big Data Security on Cloud (BigDataSecurity),
IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and IEEE Intl Conference on Intelligent Data and
Security (IDS), New York, NY, USA, 6–8 May 2023; pp. 84–90. [CrossRef]
42. Alghanmi, N.; Alotaibi, R.; Buhari, S.M. Anomaly Detection in IoT Networks: Machine Learning Approaches for Intrusion
Detection. Wirel. Pers. Commun. 2022, 122, 2309–2324. [CrossRef]
43. Fikriansyah, M.I.; Amatullah Karimah, S.; Setiadi, F. Detection of DDOS Attacks in IIoT Case Using Machine Learning Algorithms.
In Proceedings of the 2024 International Conference on Data Science and Its Applications (ICoDSA), Kuta, Bali, Indonesia,
10–11 July 2024; pp. 117–121. [CrossRef]
44. Mishra, S.; Albarakati, A.; Sharma, S.K. Cyber Threat Intelligence for IoT Using Machine Learning. Processes 2022, 10, 2673.
[CrossRef]
45. Kumar, R.; Kumar, P.; Tripathi, R.; Gupta, G.P.; Garg, S.; Hassan, M.M. A distributed intrusion detection system to detect DDoS
attacks in blockchain-enabled IoT network. J. Parallel Distrib. Comput. 2022, 164, 55–68. [CrossRef]
46. Hızal, S.; Akhter, A.S.; Çavuşoğlu, Ü.; Akgün, D. Blockchain-based IoT security solutions for IDS research centers. Internet Things
2024, 27, 101307. [CrossRef]
47. Ibrahim El Sayed, A.; Abdelaziz, M.; Hussein, M.; Elbayoumy, A.D. DDoS Mitigation in IoT Using Machine Learning and
Blockchain Integration. IEEE Netw. Lett. 2024, 6, 152–155. [CrossRef]
48. Alrayes, F.S.; Aljebreen, M.; Alghamdi, M.; Alrslani, F.A.F.; Alshuhail, A.; Almukadi, W.S.; Basheti, I.; Sharif, M.M. Harnessing
blockchain with ensemble deep learning-based distributed dos attack detection in iot-assisted secure consumer electronics
systems. Fractals 2024, 32, 09n10. [CrossRef]
49. Babu, E.S.; BKN, S.; Nayak, S.R.; Verma, A.; Alqahtani, F.; Tolba, A.; Mukherjee, A. Blockchain-based Intrusion Detection System
of IoT urban data with device authentication against DDoS attacks. Comput. Electr. Eng. 2022, 103, 108287. [CrossRef]
50. Guerra-Manzanares, A.; Medina-Galindo, J.; Bahsi, H.; Nõmm, S. Using MedBIoT Dataset to Build Effective Machine Learning-
Based IoT Botnet Detection Systems. In Proceedings of the Information Systems Security and Privacy, Online, 9–11 February
2022; Furnell, S., Mori, P., Weippl, E., Camp, O., Eds.; Springer: Cham, Swizterland, 2022; pp. 222–243.
51. Guerra-Manzanares, A.; Medina-Galindo, J.; Bahsi, H.; Nõmm, S. MedBIoT: Generation of an IoT Botnet Dataset in a Medium-
sized IoT Network. In Proceedings of the 6th International Conference on Information Systems Security and Privacy—ICISSP,
INSTICC, Valletta, Malta, 25–27 February 2020; SciTePress: Setúbal, Portugal, 2020; pp. 207–218. [CrossRef]
52. Ullah, S.; Khan, M.A.; Ahmad, J.; Jamal, S.S.; e Huma, Z.; Hassan, M.T.; Pitropakis, N.; Arshad.; Buchanan, W.J. HDL-IDS: A
Hybrid Deep Learning Architecture for Intrusion Detection in the Internet of Vehicles. Sensors 2022, 22, 1340. [CrossRef]
53. Gad, A.R.; Nashat, A.A.; Barkat, T.M. Intrusion Detection System Using Machine Learning for Vehicular Ad Hoc Networks Based
on ToN-IoT Dataset. IEEE Access 2021, 9, 142206–142217. [CrossRef]
54. Kalnoor, G.; Gowrishankar, S. A model for intrusion detection system using hidden Markov and variational Bayesian model for
IoT based wireless sensor network. Int. J. Inf. Technol. 2022, 14, 2021–2033. [CrossRef]
55. Sadhwani, S.; Manibalan, B.; Muthalagu, R.; Pawar, P. A Lightweight Model for DDoS Attack Detection Using Machine Learning
Techniques. Appl. Sci. 2023, 13, 9937. [CrossRef]
56. Chandana Swathi, G.; Kishor Kumar, G.; Siva Kumar, A. Ensemble classification to predict botnet and its impact on IoT networks.
Meas. Sens. 2024, 33, 101130. [CrossRef]
57. Alghamdi, R.; Bellaiche, M. Evaluation and Selection Models for Ensemble Intrusion Detection Systems in IoT. IoT 2022,
3, 285–314. [CrossRef]
58. Khanday, S.A.; Fatima, H.; Rakesh, N. Implementation of intrusion detection model for DDoS attacks in Lightweight IoT
Networks. Expert Syst. Appl. 2023, 215, 119330. [CrossRef]
59. Thereza, N.; Ramli, K. Development of Intrusion Detection Models for IoT Networks Utilizing CICIoT2023 Dataset. In Proceedings
of the 2023 3rd International Conference on Smart Cities, Automation & Intelligent Computing Systems (ICON-SONICS), Bali,
Indonesia, 6–8 December 2023; pp. 66–72. [CrossRef]
60. Khan, A.; Sharma, I. Guardians of the IoT: A Symphony of Ensemble Learning for DDoS Attack Resilience. In Proceedings of the
2023 4th International Conference on Computation, Automation and Knowledge Management (ICCAKM), Dubai, United Arab
Emirates, 12–13 December 2023; pp. 1–6. [CrossRef]
61. Ayad, A.G.; Sakr, N.A.; Hikal, N.A. A hybrid approach for efficient feature selection in anomaly intrusion detection for IoT
networks. J. Supercomput. 2024, 80, 26942–26984. [CrossRef]
62. Rbah, Y.; Mahfoudi, M.; Balboul, Y.; Chetioui, K.; Fattah, M.; Mazer, S.; Elbekkali, M.; Bernoussi, B. A machine learning based
intrusions detection for IoT botnet attacks. AIP Conf. Proc. 2023, 2814, 030012. [CrossRef]
Algorithms 2025, 18, 209 33 of 34

63. Churcher, A.; Ullah, R.; Ahmad, J.; ur Rehman, S.; Masood, F.; Gogate, M.; Alqahtani, F.; Nour, B.; Buchanan, W.J. An Experimental
Analysis of Attack Classification Using Machine Learning in IoT Networks. Sensors 2021, 21, 446. [CrossRef]
64. Roopak, M.; Tian, G.Y.; Chambers, J. An Intrusion Detection System Against DDoS Attacks in IoT Networks. In Proceedings of
the 2020 10th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 6–8 January
2020; pp. 0562–0567. [CrossRef]
65. Roopak, M.; Parkinson, S.; Tian, G.Y.; Ran, Y.; Khan, S.; Chandrasekaran, B. An unsupervised approach for the detection of
zero-day distributed denial of service attacks in Internet of Things networks. IET Netw. 2024, 13, 513–527. [CrossRef]
66. Ban, Y.; Zhang, D.; He, Q.; Shen, Q. APSO-CNN-SE: An Adaptive Convolutional Neural Network Approach for IoT Intrusion
Detection. Comput. Mater. Contin. 2024, 81, 567–601. [CrossRef]
67. Tyagi, H.; Kumar, R. Attack and anomaly detection in IoT networks using supervised machine learning approaches. Rev.
D’Intelligence Artif. 2021, 35, 11–21. [CrossRef]
68. Zegarra Rodríguez, D.; Daniel Okey, O.; Maidin, S.S.; Umoren Udo, E.; Kleinschmidt, J.H. Attentive transformer deep learning
algorithm for intrusion detection on IoT systems using automatic Xplainable feature selection. PLoS ONE 2023, 18, 286652.
[CrossRef]
69. Okey, O.D.; Maidin, S.S.; Adasme, P.; Lopes Rosa, R.; Saadi, M.; Carrillo Melgarejo, D.; Zegarra Rodríguez, D. BoostedEnML:
Efficient Technique for Detecting Cyberattacks in IoT Systems Using Boosted Ensemble Machine Learning. Sensors 2022, 22, 7409.
[CrossRef] [PubMed]
70. Vijayalakshmi, M.; Susmanth Srinivas, A.; Ramanathan, S. Building a Smarter Shield: Using Ensemble Learning for Multi-Class
DDoS Attacks. In Proceedings of the 2024 15th International Conference on Computing Communication and Networking
Technologies (ICCCNT), Kamand, India, 24–28 June 2024; pp. 1–6. [CrossRef]
71. Chukwukelu, G.; Essien, A.; Salami, A.; Utuk, E. Comparative Analysis of Machine Learning Techniques for DDoS Intrusion
Detection in IoT Environments. In Proceedings of the 21st International Conference on Smart Business Technologies—ICSBT,
INSTICC, Dijon, France, 9–11 July 2024; SciTePress: Setúbal, Portugal, 2024; pp. 19–27. [CrossRef]
72. Gebrye, H.; Wang, Y.; Li, F. Computer vision based distributed denial of service attack detection for resource-limited devices.
Comput. Electr. Eng. 2024, 120, 109716. [CrossRef]
73. Alabsi, B.A.; Anbar, M.; Rihan, S.D.A. Conditional Tabular Generative Adversarial Based Intrusion Detection System for Detecting
Ddos and Dos Attacks on the Internet of Things Networks. Sensors 2023, 23, 5644. [CrossRef]
74. Berqia, A.; Bouijij, H.; Merimi, A.; Ouaggane, A. Detecting DDoS Attacks using Machine Learning in IoT Environment. In
Proceedings of the 2024 International Conference on Intelligent Systems and Computer Vision (ISCV), Fez, Morocco, 8–10 May
2024; pp. 1–8. [CrossRef]
75. Hajla, S.E.; Ennaji, E.M.; Maleh, Y.; Mounir, S. Enhancing IoT network defense: Advanced intrusion detection via ensemble
learning techniques. Indones. J. Electr. Eng. Comput. Sci. 2024, 35, 2010–2020. [CrossRef]
76. Mante, J.; Kolhe, K. Ensemble of tree classifiers for improved DDoS attack detection in the Internet of Things. Math. Model. Eng.
Probl. 2024, 11, 2355–2367. [CrossRef]
77. Kamaldeep; Malik, M.; Dutta, M. Feature Engineering and Machine Learning Framework for DDoS Attack Detection in the
Standardized Internet of Things. IEEE Internet Things J. 2023, 10, 8658–8669. [CrossRef]
78. Nimbalkar, P.; Kshirsagar, D. Feature selection for intrusion detection system in Internet-of-Things (IoT). ICT Express 2021,
7, 177–181. [CrossRef]
79. Morshedi, R.; Matinkhah, S.M.; Sadeghi, M.T. Intrusion Detection for IoT Network Security with Deep learning. J. AI Data Min.
2024, 12, 37–55. [CrossRef]
80. Ullah, S.; Mahmood, Z.; Ali, N.; Ahmad, T.; Buriro, A. Machine Learning-Based Dynamic Attribute Selection Technique for DDoS
Attack Classification in IoT Networks. Computers 2023, 12, 115. [CrossRef]
81. Pham, V.T.; Nguyen, H.L.; Le, H.C.; Nguyen, M.T. Machine Learning-based Intrusion Detection System for DDoS Attack in the
Internet of Things. In Proceedings of the 2023 International Conference on System Science and Engineering (ICSSE), Ho Chi
Minh, Vietnam, 27–28 July 2023; pp. 375–380. [CrossRef]
82. Almaraz-Rivera, J.G.; Perez-Diaz, J.A.; Cantoral-Ceballos, J.A. Transport and Application Layer DDoS Attacks Detection to IoT
Devices by Using Machine Learning and Deep Learning Models. Sensors 2022, 22, 3367. [CrossRef] [PubMed]
83. Ferrag, M.A.; Friha, O.; Hamouda, D.; Maglaras, L.; Janicke, H. Edge-IIoTset: A New Comprehensive Realistic Cyber Security
Dataset of IoT and IIoT Applications for Centralized and Federated Learning. IEEE Access 2022, 10, 40277–40288. [CrossRef]
84. Poisson, M.; Carnier, R.; Fukuda, K. GothX: A generator of customizable, legitimate and malicious IoT network traffic. In
Proceedings of the 17th Cyber Security Experimentation and Test Workshop, CSET ’24, Philadelphia, PA, USA, 13 August 2024;
pp. 65–73. [CrossRef]
85. Alsaedi, A.; Moustafa, N.; Tari, Z.; Mahmood, A.; Anwar, A. TON_IoT Telemetry Dataset: A New Generation Dataset of IoT and
IIoT for Data-Driven Intrusion Detection Systems. IEEE Access 2020, 8, 165130–165150. [CrossRef]
Algorithms 2025, 18, 209 34 of 34

86. Binu, P.K.; Kiran, M. Attack and Anomaly Prediction in IoT Networks using Machine Learning Approaches. In Proceedings of
the 2021 Fourth International Conference on Electrical, Computer and Communication Technologies (ICECCT), Erode, India,
15–17 September 2021; pp. 1–6. [CrossRef]
87. Al-Hawawreh, M.; Sitnikova, E.; Aboutorab, N. X-IIoTID: A Connectivity-Agnostic and Device-Agnostic Intrusion Data Set for
Industrial Internet of Things. IEEE Internet Things J. 2022, 9, 3962–3977. [CrossRef]
88. KDD Cup 1999 Data. 1999. Available online: https://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html (accessed on 12
December 2024).
89. Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic
Characterization. In Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP).
SCITEPRESS, Madeira, Portugal, 22–24 January 2018; pp. 108–116. [CrossRef]
90. Peterson, J.M.; Leevy, J.L.; Khoshgoftaar, T.M. A Review and Analysis of the Bot-IoT Dataset. In Proceedings of the 2021 IEEE
International Conference on Service-Oriented System Engineering SOSE, Oxford, UK, 23–26 August 2021; pp. 20–27.
91. Atuhurra, J.; Hara, T.; Zhang, Y.; Sasabe, M.; Kasahara, S. Dealing with Imbalanced Classes in Bot-IoT Dataset. arXiv 2024,
arXiv:2403.18989.
92. Laboratory, M.L. 2000 DARPA Intrusion Detection Scenario Specific Datasets. 2000. Available online: https://archive.ll.mit.edu/
ideval/data/2000data.html (accessed on 12 December 2024).
93. Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the 2009
IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 8–10 July 2009;
pp. 1–6. [CrossRef]
94. Shiravi, A.; Shiravi, H.; Tavallaee, M.; Ghorbani, A.A. Toward developing a systematic approach to generate benchmark datasets
for intrusion detection. Comput. Secur. 2012, 31, 357–374. [CrossRef]
95. García, S.; Grill, M.; Stiborek, J.; Zunino, A. An empirical comparison of botnet detection methods. Comput. Secur. 2014,
45, 100–123. [CrossRef]
96. Moustafa, N.; Slay, J. The evaluation of network anomaly detection systems: Statistical analysis of the UNSW-NB15 data set and
the comparison with the KDD99 data set. Inf. Secur. J. Glob. Perspect. 2016, 25, 18–31. [CrossRef]
97. Alkasassbeh, M.; Al-Naymat, G.; Hawari, E. Towards Generating Realistic SNMP-MIB Dataset for Network Anomaly Detection.
Int. J. Comput. Sci. Inf. Secur. 2016, 14, 1162–1185.
98. Meidan, Y.; Bohadana, M.; Mathov, Y.; Mirsky, Y.; Breitenbacher, D.; Shabtai, A.; Elovici, Y. N-BaIoT: Network-Based Detection of
IoT Botnet Attacks Using Deep Autoencoders. 2018. Available online: https://archive.ics.uci.edu/dataset/442/detection+of+
iot+botnet+attacks+n+baiot (accessed on 20 December 2024).
99. Mirsky, Y.; Doitshman, T.; Elovici, Y.; Shabtai, A. Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection.
arXiv 2018, arXiv:1802.09089.
100. AWS. Canadian Institute for Cybersecurity CSE-CIC-IDS2018 Dataset. 2018. Available online: https://registry.opendata.aws/
cse-cic-ids2018/ (accessed on 12 December 2024).
101. Aubet, F.; Pahl, M. DS2OS Traffic Traces. 2018. Available online: https://www.kaggle.com/datasets/francoisxa/ds2ostraffictraces
(accessed on 12 December 2024).
102. Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. CIC-DDoS2019 Dataset. 2019. Available online: https://www.unb.ca/cic/
datasets/ddos-2019.html (accessed on 12 December 2024).
103. Garcia, S.; Parmisano, A.; Erquiaga, M.J. IoT-23: A labeled dataset with malicious and benign IoT network traffic. Zenodo 2020.
[CrossRef]
104. Ullah, I.; Mahmoud, Q.H. A Scheme for Generating a Dataset for Anomalous Activity Detection in IoT Networks. In Proceedings
of the Advances in Artificial Intelligence, Canberra, ACT, Australia, 29–30 November 2020; Goutte, C., Zhu, X., Eds.; Springer
International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 508–520.
105. Ward, A.; Cordero, S. Application Layer DDoS Dataset. 2020. Available online: https://www.kaggle.com/datasets/wardac/
applicationlayer-ddos-dataset (accessed on 12 December 2024).
106. Jovanović, D.; Vuletić, P. ETF IoT Botnet Dataset. Mendeley Data 2021. [CrossRef]
107. Neto, E.; Dadkhah, S.; Ferreira, R.; Zohourian, A.; Lu, R.; Ghorbani, A. CICIoT2023: A real-time dataset and benchmark for
large-scale attacks in IoT environment. Sensors 2023, 23, 5941. [CrossRef]
108. Abdulrahman, A.; Ibrahem, M.K. Toward Constructing a Balanced Intrusion Detection Dataset Based on CICIDS2017. Samarra J.
Pure Appl. Sci. 2020, 2, 132–142.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like