Ddos Attack Simulation Using Big Data and Machine Learning
Ddos Attack Simulation Using Big Data and Machine Learning
Abstract
Due to the developemt in the latest digital technologies, internet service use has surged recently. In
order for these online businesses to succeed, they must be able to consistently and effectively supply
their services. As a result of the DDoS assault, online sources are impacted in terms of both their avail-
ability and their computational capacity. DDoS attacks are useful for cyber-attackers since there is no
effective techniqque for the identification of them. In recent years, researchers have been experimenting
with duffernet latest techniques like machine learning (ML) approaches to see whether they can build
effective methods for detecting DDoS assaults.Machine learning and big data are used to identify DDoS
assaults in this research paper.
Keywords
DDoS attack, Machine learning, Big data
1. Introduction
When a huge number of malicious computers assault the victim’s resources in a coordinated
fashion, it is known as a DDoS attack. Assault programmes such as Slowloris, GoldenEye, and
others make it easy for anybody to launch a DDoS attack on a target and wreak havoc on
their resources or make their bandwidth inaccessible to others [1]. DDoS assaults come in a
variety of forms, making it difficult for the detection filter to keep up [2]. When an attacker
sends a high number of SYN packets to the victim’s end in order to overwhelm the connection
table, this is known as TCP flooding. There are also UDP and HTTP flooding attacks that use
International Conference on Smart Systems and Advanced Computing (Syscom-2021), December 25–26, 2021
" [email protected] (A. Gaurav*); [email protected] (Z. Zhou); [email protected]
(K. T. Chui*); [email protected] (F. COLACE); [email protected] (P. Chaurasia); [email protected]
(C. Hsu*)
© 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073
CEUR Workshop Proceedings (CEUR-WS.org)
Figure 1: DDoS attack architecture
the victim’s bandwidth and prevent legitimate users from accessing it. Detection of DDoS
attacks may be broken down into three categories: preventive methods, defensive methods, and
traceback methods. DDoS attacks may be mitigated using a variety of approaches, including load
balancing, honeypots, and more. In DDoS attack traceback procedures, a variety of methods
are used to locate the assault’s origin. DDoS defensive techniques must be developed that
not only identify the specific DDoS assault, but also take suitable countermeasures against
it. It is possible to distinguish a DDoS assault from flash crowd traffic by using a decent
DDoS protection strategy. As many legal people try to access an online resource, this traffic is
known as flash crowd. Because of this, an effective DDoS detection algorithm must be able to
distinguish the flash crowd from the DDoS assault traffic and not discard it as part of the attack
traffic. There are several machine learning algorithms being used to identify DDoS assaults
and flash crowds because of recent advancements in the area of machine learning. With the
use of machine learning algorithms and attack patterns, it’s feasible to train security filters to
block new forms of threats. The supervised learning approach and the unsupervised learning
approach are two of the strategies available in machine learning for detecting aberrant traffic.
Unsupervised learning, on the other hand, relies on labelled data sets that are difficult to get,
whereas supervised learning relies on data sets that are labelled.
In this paper, we proposed a big data-based method for the detection of DDoS attacks and
flash crowds. The contributions of this paper are as follows:
• We used the dataset generated by OMNET++ for training and testing the machine learning
model.
• Our processing performance has been boosted thanks to Apache Spark, which we utilise
for data processing.
• The performance of our suggested model is evaluated using standard statical metrics.
2. Related Work
An adaptive density-based clustering technique (ADBSCAN) developed by Li et al. [3] is based
on closest neighbour graphs. KNN density estimation and distributional assumptions are used
in the proposed method to quickly identify various density clusters. Samples in dense areas
are found using the KNN estimator, and then the statistical technique is used to determine the
clusters’ densities, K-means clustering was developed by Gu et al. [4] for the identification of
DDoS attacks. Handloop-based feature selection is utilised to detect DDoS assault characteristics
properly in the proposed methodology. In K-means clustering, these characteristics are utilised
to distinguish regular traffic from malicious traffic.
Zombies are counted using an artificial neural network (ANN) in the proposed technique by
the authors in [5]. Low frequency attacks can be accurately predicted with this strategy since
it doesn’t rely on attack frequency. NS-2, a network simulator for Linux, is used to produce
the training data for feed forward neural networks. MSE is used to compare the estimate
performance of various feed forward networks. The network’s ability to anticipate the number
of zombies engaged in a DDoS assault with extremely low test error is encouraging.
As part of the proposed method in [6] for DDoS detection, the authors offer a unique architec-
ture that monitors traffic changes inside the ISP Domain and then classifies the network flows
that convey attack traffic. Detection of DDoS assaults relies on two statistical metrics: volume
and flow. The precision of threshold value choices has a significant impact on the effectiveness
of a system for detecting and characterising anomalies. When threshold values are set too high
or too low, a great many tests will return erroneous results. Six-Sigma and variable tolerance
factor approaches are employed in the proposed strategy in order to properly and dynamically
establish threshold values for a wide range of statistical measures. It is used as a testbed to
evaluate the efficacy of the suggested technique on a Linux platform. There are several assault
scenarios, each with a varied quantity and attack strength of zombie machines. Authors in
[7] represents the imapct of DDoS attack on IoT devices. Also, in [8] author proposed captch
method for the identification of DDoS attack.
It is difficult to discern legal traffic from attack traffic during DDoS attack. Wireless networks
are especially vulnerable to a DDoS assault because of the nature of ad hoc networks. Rather
of allowing the DDoS attack to occur and then taking the required actions to deal with it, it
is preferable to prevent it from happening in the first place. The author in [9] address how
MANET might be damaged by DDoS assaults in their article. Besides this, an unique DDoS
mitigation strategy is suggested for MANETs.
DDoS attacks may be detected using Gu et al [10] semi-supervised K-means clustering.
DDoS attacks may be identified using three primary elements retrieved from the datasets in
the suggested technique. The k-means clustering procedure is accurate because the extracted
features are utilised to label samples in the data sets. Using a semi-supervised technique, the
clustering algorithm has a high convergence rate.
Authors in [11] proposed DDoS attack detction technique for healthcae services. Also, the
authors propsoed DDoS detection techniue in cloud environment[12] and VANET envieonment
[13].
Using density-based semi-supervised clustering, Gertrudes et al [14] suggested a new ap-
proach. There are many semi-supervised clustering techniques that are used in the suggested
method. The author depicts the link between graph-based techniques and density-based ap-
proaches as well. No re-computation and ordered-dependencies are present in the proposed
framework compared to prior semi-supervised techniques.
3. Proposed approach
This section of the article discusses the solution we offer. We’ve put the plan into action on
the routers. For each time period ∇ t, routers extract the incoming traffic characteristics.
Afterwards, the obtained attributes are fed into the aforementioned machine learning model.
This classification is made by the machine learning algorithms as soon as a packet is found to
be malicious. Finally, all malicious packets are discarded by the router.