Detection of Attacks (DoS, Probe) Using Genetic Algorithm Project Report
Detection of Attacks (DoS, Probe) Using Genetic Algorithm Project Report
Algorithm
SATHYABAMA
INSTITUTE OF SCIENCE AND
TECHNOLOGY
(DEEMED TO BE UNIVERSITY)
i
Accredited with Grade “A” by NAAC
(Established under Section 3 of UGC Act, 1956)
JEPPIAAR NAGAR, RAJIV GANDHI SALAI, CHENNAI-600119
www.sathyabama.ac.in
BONAFIDE CERTIFICATE
Internal Guide
Dr. L. SUJIHELEN, M.E., Ph.D.,
DECLARATION
ii
I AVULA VENKATA SRINADH REDDY (Reg. No. 38110061),
BODDU PRASANTH REDDY (Reg. No. 38110422) hereby
declare that the Project Report entitled “Detection of Attacks
(DoS, Probe) Using Genetic Algorithm” done by me under the
guidance of Dr. L. SUJI HELEN, M.E., Ph.D., is submitted in
partial fulfilment of the requirements for the award of Bachelor of
Engineering Technology degree in Computer Science and
Engineering.
DATE:
PLACE: CHENNAI SIGNATURE OF THE
CANDIDATE
ACKNOWLEDGEMENT
iii
I convey my thanks to Dr. T. SASIKALA, M.E., Ph.D., Dean,
School of Computing, and DR. S. VIGNESHWARI, M.E., Ph.D.,
Head of the Department, Department of Computer Science and
Engineering for providing me necessary support and details at
the right time during the progressive reviews.
ABSTRACT
iv
wide to find the sequence of intrusion on the network connection.
Therefore, it needs a way that can detect network intrusion to
reflect the current network traffics. In this study, a novel method
to find intrusion characteristic for IDS using genetic algorithm
machine learning of data mining technique was proposed.
Method used to generate of rules is classification by Genetic
algorithm of decision tree. These rules can determine of intrusion
characteristics then to implement in the genetic algorithm as
prevention.so that besides detecting the existence of intrusion
also can execute by doing deny of intrusion as prevention.
TABLE OF CONTENT
vi
3.3 Genetic Algorithm 34
4.4 Code 46
Conclusion 77
References 77
Publication 78
List of Figures
Figur Title
e no.
3.2.1 Decision Tree
3.3.1 Genetic Algorithm steps
3.3.2 Genetic Algorithm Application
4.2.4 Validation
4.2.5 Prediction
4.3.1 Intrusion detection Application
4.3.2 Training dataset
4.3.3 Test All attacks dataset
vii
4.3.4 All attacks Plot Graph
4.3.5 Test Normal Attack Dataset
4.3.6 Normal Attack Plot Graph
List of Tables
Table.no Title
viii
CHAPTER 1
INTRODUCTION
Approaches for intrusion detection can be broadly divided into
two types: misuse detection and anomaly detection. In misuse
detection system, all known types of attacks (intrusions) can be
detected by looking into the predefined intrusion patterns in
system audit traffic. In case of anomaly detection, the system first
learns a normal activity profile and then flags all system events
that do not match with the already established profile. The main
advantage of the misuse detection is its capability for high
detection rate with a difficulty in finding the new or unforeseen
attacks. The advantage of anomaly detection lies in the ability to
identify the novel (or unforeseen) attacks at the expense of high
false positive rate. Network monitoring-based machine learning
techniques have been involved in diverse fields. Using bi-
directional long-short-term-memory neural networks, a social
media network monitoring system is proposed for analysing and
detecting traffic accidents.
1.1 BACK GROUND
The proposed method retrieves traffic-related information from
social media (Facebook and Twitter) using query-based crawling:
this process collects sentences related to any traffic events, such
as jams, road closures, etc. Subsequently, several pre-
processing techniques are carried out, such as steaming,
tokenization, POS tagging and segmentation, in order to
transform the retrieved data into structured form. Thereafter, the
data are automatically labelled as ’traffic’ or ’non-traffic’, using a
Latent Dirichlet Allocation (LDA) algorithm. Traffic- labelled data
are analysed into three types; positive, negative, and neutral. The
output from this stage is a sentence labelled according to
whether it is traffic or non-traffic, and with the polarity of that
traffic sentence (positive, negative or neutral). Then, using the
bag-of-words (BoW) technique, each sentence is transformed
into a one-hot encoding representation in order to feed it to the
Bi-directional LSTM neural network (Bi-LSTM). After the learning
9
process, the neural networks perform multi-class classification
using the softmax layer in order to classify the sentence in terms
of location, traffic event and polarity types. The proposed method
compares different classical machine learning and advanced
deep learning approaches in terms of accuracy, F-score and
other criteria.
1.4 OBJECTIVE
The primary purposes for an IDS deployment are to reduce risk,
identify error, optimize network use, provide insight into threat
levels, and change user behavior. Thus, an IDS provides more
than just detection of intrusion.
10
Genetic Algorithm (GA) developed specifically for problems with
multiple objectives. They differ primarily from traditional GA by
using specialized fitness functions and introducing methods to
promote solution diversity.
The goal of this algorithm is to create a model that predicts the
value of a target variable, for which the decision tree uses the
tree representation to solve the problem in which the leaf node
corresponds to a class label and attributes are represented on
the internal node of the tree.
11
CHAPTER 2
AIM AND SCOPE OF THE PRESENT
INVESTIGATION
12
breakdowns, systems paralysis, online banking frauds and
robberies. These issues have a significantly destructive impact
on organizations, companies or even economies. Accuracy, high
performance and real-time systems are essential to achieve this
goal successfully. Extending intelligent machine learning
algorithms in a network intrusion detection system (NIDS)
through a software defined network (SDN) has attracted
considerable attention in the last decade. Big data availability, the
diversity of data analysis techniques, and the massive
improvement in the machine learning algorithms enable the
building of an effective, reliable and dependable system for
detecting different types of attacks that frequently target
networks. This study demonstrates the use of machine learning
algorithms for traffic monitoring to detect malicious behaviour in
the network as part of NIDS in the SDN controller. Different
classical and advanced tree-based machine learning techniques,
Decision Tree, Random Forest and XGBoost are chosen to
demonstrate attack detection. The NSL-KDD dataset is used for
training and testing the proposed methods; it is considered a
benchmarking dataset for several state-of-the-art approaches in
NIDS. Several advanced pre-processing techniques are
performed on the dataset in order to extract the best form of the
data, which produces outstanding results compared to other
systems. Using just five out of 41 features of NSL-KDD, a multi-
class classification task is conducted by detecting whether there
is an attack and classifying the type of attack (DDoS, PROBE,
R2L, and U2R), accomplishing an accuracy of 95.95%.
A network intrusion detection system is a process for discovering
the existence of malicious or unwanted packets in the network.
This process is done using real-time traffic monitoring to find out
if any unusual behaviour is present in the network or not. Big
data, powerful computation facilities, and the expansion of the
network size increase the demand for the required tasks that
should be carried out simultaneously in real-time. Therefore,
NIDS should be careful, accurate, and precise in monitoring,
13
which has not been the case in the traditional methods. On the
other hand, the rapid increase in the accuracy of machine
learning algorithms is highly impressive. Its introduction relies on
the increasing demand for improved performance on different
types of network. However, software defined network (SDN)
implementation of the network-based intrusion detection system
(NIDS) has opened a frontier for its deployment, considering the
increasing scope and typology of security risks of modern
networks. The rapid growth in the volume of network data and
connected devices carries inherent security risks. The adoption of
technologies such as the Internet of Things (IoT), artificial
intelligence (AI), and quantum computing, has increased the
threat level, making network security challenging and
necessitating a new paradigm in its implementation. Various
attacks have overwhelmed previous approaches (classified into
signature-based intrusion detection systems and anomaly-based
intrusion detection systems, increasing the need for advanced,
adaptable and resilient security implementation. For this reason,
the traditional network design platform is being transformed into
the evolving SDN implementation Monitoring data and analysing
it over time are essential to the process of predicting future
events, such as risks, attacks and diseases. The more details are
formed, discovered and documented through analysing very
large-scale data, the more saved resources, as well as the
working environment, will remain normal without any variations.
Big data analytics (BDA) research in the supply chain becomes
the secret of a protector for managing and preventing risks. BDA
for humanitarian supply chains can aid the donors in their
decision of what is appropriate in situations such as disasters,
where it can improve the response and minimize human suffering
and deaths. BDA and data monitoring using machine learning
can help in identifying and understanding the interrelationships
between the reasons, difficulties, obstacles and barriers that
guide organizations in taking the most efficient and accurate
decisions in risk management processes. This could impact
14
entire organizations and countries, producing a hugely significant
improvement in the process. Network monitoring-based machine
learning techniques have been involved in diverse fields. Using
bi-directional long-short-term-memory neural networks, a social
media network monitoring system is proposed for analysing and
detecting traffic accidents. The proposed method retrieves traffic-
related information from social media (Facebook and Twitter)
using query-based crawling: this process collects sentences
related to any traffic events, such as jams, road closures, etc.
Subsequently, several pre-processing techniques are carried out,
such as steaming, tokenization, POS tagging and segmentation,
in order to transform the retrieved data into structured form.
Thereafter, the data are automatically labelled as ’traffic’ or ’non-
traffic’, using a latent Dirichlet allocation (LDA) algorithm. Traffic-
labelled data are analysed into three types; positive, negative,
and neutral. The output from this stage is a sentence labelled
according to whether it is traffic or non-traffic, and with the
polarity of that traffic sentence (positive, negative or neutral).
Then, using the bag-of-words (BoW) technique, each sentence is
transformed into a one-hot encoding representation in order to
feed it to the Bidirectional LSTM neural network (Bi-LSTM). After
the learning process, the neural networks perform multi-class
classification using the softback layer in order to classify the
sentence in terms of location, traffic event and polarity types. The
proposed method compares different classical machine learning
and advanced deep learning approaches in terms of accuracy, F-
score and other criteria. Many initiatives and workshops have
been conducted in order to improve and develop the healthcare
systems using machine learning, such as [12]. In these
workshops several proposed machine algorithms have been
used, such as K Nearest-Neighbours, logistic regression, K-
means clustering, Random Forest (RF) etc., together with deep
learning algorithms such as CNN, RNN, fully connected layer and
auto-encoder. These varieties of techniques allow the
researchers to deal with several data types, such as medical
15
imaging, history, medical notes, video data, etc. Therefore,
different topics and applications are introduced, with significant
performance results such as causal inference, in investigations of
Covid-19, disease prediction, such as disorders and heart
diseases. Using intelligent ensemble deep learning methods,
healthcare monitoring is carried out for prediction of heart
diseases. Real-time health status monitoring can prevent and
predict any heart attacks before occurrence. For disease
prediction, the proposed ensemble deep learning approach
achieved a brilliant accuracy performance score of 98.5%. The
proposed model takes two types of data that are transferred and
saved on an online cloud database. The first is the data
transferred from the sensors; these sensors have been placed in
different places on the body in order to extract more than 10
different types of medical data. The second type is the daily
electronic medical records from doctors, which includes various
types of data, such as smoking history, family diseases, etc. The
features are fused using the feature fusion Framingham Risk
factors technique, which executes two tasks at a time, fusing the
data together, and then extracting a fused and informative feature
from this data. Then different pre-processing techniques are used
to transform the data into a structured and well-prepared form,
such as normalization, missing values filtering and feature
weighting. Subsequently, an ensemble deep learning algorithm
starts which learns from the data in order to predict whether a
heart disease will occur or the threat is absent. IDS refers to a
mechanism capable of identifying or detecting intrusive activities.
In a broader view, this encompasses all the processes used in
the discovery of unauthorized uses of network devices or
computers. This is achieved through software designed
specifically to detect unusual or abnormal activities. IDS can be
classified according to several s 1urveys and sources in the
literature into four types (HIDS, NIDS, WIDS, NBA). NIDS is an
inline or passive-based intrusion detection technique. The scope
of its detection targets network and host levels. The only
16
architecture that fits and works with NIDS is the managed
network. The advantage of using NIDS is that it costs less and is
quicker in response, since there is no need to maintain sensor
programming at the host level. The performance of monitoring
the traffic is close to real-time; NIDS can detect attacks as they
occur. However, it has the following limited features. It does not
indicate if such attacks are successful or not: it has restricted
visibility inside the host machine. There is also no effective way
to analyse encrypted network traffic to detect the type of attack.
Moreover, NIDS may have difficulty capturing all packets in a
large or busy network. Thus, it may fail to recognize an attack
launched during a period of high traffic. SDN provides a novel
means of network implementation, stimulating the development
of a new type of network security application. It adopts the
concept of programmable networks through the deployment of
logically centralized management. The network deployment and
configuration are virtualized to simplify complex processes, such
as orchestration, network optimization, and traffic engineering. It
creates a scalable architecture that allows sufficient and reliable
services based on certain types of traffic. The global view
approach to a network enhances flow-level control of the
underlying layers. Implementing NIDS over SDN becomes a
major effective security defence mechanism for detecting network
attacks from the network entry point. NIDS has been
implemented and investigated for decades to achieve optimal
efficiency. It represents an application or device for monitoring
network traffic for suspicious or malicious activity with policy
violations. Such activities include malware attacks, untrustworthy
users, security breaches, and DDoS. NIDS focuses on identifying
anomalous network traffic or behaviour; its efficiency means that
network anomaly is adequately implemented as part of the
security implementation. Since it is nearly impossible to prevent
threats and attacks, NIDS will ensure early detection and
mitigation. However, the advancement in NIDS has not instilled
sufficient confidence among practitioners, since most solutions
17
still use less capable, signature-based techniques. This study
aims to increase the focus on several points:
Choosing the right algorithm for the right tasks depends on the
data types, size and network behaviour and needs.
Implementing the optimized development process by preparing
and selecting the benchmark dataset in order to build a promising
system in NIDS.
Analysing the data, finding, shaping, and engineering the
important features, using several pre-processing techniques by
stacking them together with an intelligent order to find the best
accuracy with the lowest amount of data representation and size.
Proposing an integration and complete development process
using those algorithms and techniques from the selection of
dataset to the evaluation of the algorithms using a different
metric. Which can be extended to other NIDS applications.
18
Integrating machine learning algorithms into SDN has attracted
significant attention.
In, a solution was proposed that solved the issues in KDD Cup 99
by performing an extensive experimental study, using the NSL-
KDD dataset to achieve the best accuracy in intrusion detection.
The experimental study was conducted on five popular and
efficient machine learning algorithms (RF, J48, SVM, CART, and
Naïve Bayes). The correlation feature selection algorithm was
used to reduce the complexity of features, resulting in 13 features
only in the NSL-KDD dataset. This study tests the NSL-KDD
dataset’s performance for real-world anomaly detection in
network behaviour. Five classic machine learning models RF,
J48, SVM, CART, and Naïve Bayes were trained on all 41
features against the five normal types of attacks, DOS, probe,
U2R, and R2L to achieve average accuracies of 97.7%, 83%,
94%, 85%, and 70% for each algorithm, respectively. The same
models were trained again using the reduced 13 features to
achieve average accuracies of 98%, 85%, 95%, 86%, and 73%
for each model. In, a deep neural network model was proposed
to find and detect intrusions in the SDN. The NSL-KDD dataset
was used to train and test the model. The neural network was
constructed with five primary layers, one input layer with six
inputs, three hidden layers with (12, 6, 3) neurons, and one
output layer with 2D dimensions. The proposed method was
trained on six features chosen from 41 features in the NSL-KDD
dataset, which are basic and traffic features that can easily be
obtained from the SDN environment. The proposed method
calculates the accuracy, precision and recall, achieving an F1-
score of 0.75. A second evaluation was conducted on seven
classic machine learning models (RF, NB, NB Tree, J48, DT,
MLP, and SVM) proposed in and the model achieved sixth place
out of eight. The same author extended the approach using a
gated recurrent unit neural network (GRU-RNN) for SDN
anomaly detection, achieving accuracy up to 89%. In addition,
the minimax normalization technique is used for feature scaling to
19
improve and boost the learning process. The SVM classifier,
integrated with the principal component analysis (PCA) algorithm,
was used for an intrusion detection application. The NSL-KDD
dataset is used in this approach to train and optimize the model
for detecting abnormal patterns. A Min-Max normalization
technique was proposed to solve the diversity data scale ranges
with the lowest misclassification errors. The PCA algorithm is
selected as a statistical technique to reduce the NSL-KDD
dataset’s complexity, reducing the number of trainable
parameters that needed to be learned. The nonlinear radial basis
function kernel was chosen for SVM optimization. Detection rate
(DR), false alarm rate (FAR), and correlation coefficient metrics
were chosen to evaluate the proposed model, with an overall
average accuracy of 95% using 31 features in the dataset. In
[32], an extreme gradient-boosting (XGBoost) classifier was used
to distinguish between two attacks, i.e., normal and DoS. The
detection method was analysed and conducted over POX SDN,
as a controller, which is an SDN open-source platform for
prototyping and developing a technique based on SDN. Mininet
was used to emulate the network topology to simulate real-time
SDN-based cloud detection. Logistic regression was selected as
a learning algorithm, with a regularization term penalty to prevent
overfitting. The XGBoost term was added and combined with the
logistic regression algorithm to boost the computations by
constructing structure trees. The dataset used in this approach
was KDD Cup 1999, while 400 K samples were selected for
constructing the training set. Two types of normalization
techniques were used; one with a logarithmic-based technique
and one with a Min-Max-based technique. The average overall
accuracy for XGBoost, compared to RF and SVM, was 98%,
96%, and 97% respectively. Based on DDoS attack
characteristics, a detection system was simulated with the
Mininet and FL floodlight platform using the SVM algorithm [5].
The proposed method categorizes the characteristics into six
tuples, which are calculated from the packet network. These
20
characteristics are the speed of the source IP (SSIP), the speed
of the source port, the standard deviation of FL flow packets, the
deviation of FL flow bytes (SDFB), the speed of flow entries, and
the ratio of pair-FL flow. Based on the calculated statistics from
the SVM classifier’s six characteristics, the current network state
is normal or attack. Attack flow (AF), DR, and FAR were chosen
to achieve an average accuracy of 95%. In TSDL a model with
two stages of deep neural networks was designed and proposed
for NIDS, using a stacked auto-encoder, integrated with softmax
in the output layer as a classifier. TSDL was designed and
implemented for Multiclass classification of attack detection.
Down-sampling and other pre-processing techniques were
performed over different datasets in order to improve the
detection rate, as well as the monitoring efficiency. The detection
accuracy for UNSW-NB15 was 89.134%. Different models of
neural networks, such as variation auto-encoder, seq2seq
structures using Long-Short-term-Memory (LSTM) and fully
connected networks were proposed in [34] for NIDS. The
proposed approach was designed and implemented to
differentiate between normal and attack packets in the network,
using several datasets, such as NSL-KDD, UNSW NB15,
KYOTO-HONEYPOT, and MAWILAB. A variety of pre-processing
techniques have been used, such as one-hot-encoding,
normalization, etc., for data preparation, feature manipulation and
selection and smooth training in neural networks. Those factors
are designed mainly, but not only, to enable the neural networks
to learn complex features from different scopes of a single
packet. Using 4 hidden layers, a deep neural network model [35]
was illustrated and implemented on KDD cup99 for monitoring
intrusion attacks. Feature scaling and encoding were used for
data pre-processing and lower data usage. More than 50 features
were used to perform this task on different datasets. Therefore,
complex hardware GPUs were used in order to handle this huge
number of features with lower training time. A supervised [36]
adversarial auto-encoder neural network was proposed for NIDS.
21
It combined GANS and a variation auto-encoder. GANS consists
of two different neural networks competing with each other,
known as the generator and the discriminator. The result of the
competition is to minimize the objective function as much as
possible, using the Jensen Shannon minimization algorithm. The
generator tries to generate fake data packets, while the
discriminator determined whether this data is real or fake; in other
words, it checks if that packet is an attack or normal. In addition,
the proposed method integrates the regularization penalty with
the model structure for overfitting control behaviour. The results
were reasonable in the detection rate of U2RL and R2L but lower
in others. Multi-channel deep learning of features for NIDS was
presented in [37], using AE involving CNN, two fully connected
layers and the output to the softmax classifier. The evaluation is
done over three different datasets; KDD cup99, UNSWNB15 and
CICIDS, with an average accuracy of 94%. The proposed model
provides effective results; however, the structure and the
characteristics of the attack were not highlighted clearly. The
proposed method enhances the implementation of NIDS by
deploying machine learning over SDN. It introduces a machine
learning algorithm for network monitoring within the NIDS
implementation on the central controller of the SDN. In this paper,
enhanced tree-based machine learning algorithms are proposed
for anomaly detection. Using only five features, a multi-class
classification task is conducted by detecting whether there is an
attack or not and classifying the type of attack.
In this section, we discuss and explain each component and its
role in the NIDS architecture. As shown in the SDN architecture
can be divided into three main layers, as follows:
System Architecture Layers NIDS component architecture is
constructed in three main parts as follows:
The infrastructure layer consists of two main parts: hardware and
software components. The hardware components are devices
such as routers and switches. The software components are
22
those components that interface with the hardware, such as
Open Flow switches.
The control layer is an intelligent network controller, such as an
SDN controller. The control layer is the layer responsible for
regulating actions and traffic data management by establishing or
denying every network flow.
The application layer is the one that performs all network
management tasks. These tasks can be performed using an SDN
controller and NIDS.
Attacks are created by an attacker and delivered through the
internet. NIDS is deployed over the SDN controller. As NIDS
listens to the network and actively compares all traffic against
predefined attack signatures, it detects the attacker’s scanning
attempts. It sends an alert to administrators through its control,
and the connections will be blocked due to specific rules in the
firewall or routers.
This section presents a generalized flowchart of the proposed
method. The dataset, pre-processing techniques, and proposed
machine learning algorithms will be presented and discussed.
In this subsection, a generalized block diagram is presented and
discussed. As shown, the NSLKDD dataset is used. Data
analysis, feature engineering, and other pre-processing
techniques are conducted to train the model, using the best
hyper-parameters, with only five features. Tree-based algorithms
are used for the multi-class classification task. The processed
data enter the algorithm and are classified as to whether they
constitute an attack or are normal; then, the type of attack will be
analysed to see which category it belongs to, and action is taken
accordingly.
The KDD Cup is the leading data mining competition in the world.
The NSL-KDD dataset was proposed to solve many issues
represented in the KDD Cup 1999 dataset. Many researchers
have used the NSL-KDD dataset to develop and evaluate the
NIDS problem. The dataset includes all types of attacks. The
dataset has 41 features, categorized into three main types (basic
23
feature, content-based, and traffic-based features) and labelled
as either normal or attack, with the attack type precisely
categorized. The categories can be classified into four main
groups, with a brief description of each attack type and its impact.
As stated in the previous subsection, the dataset has 41 features
labelled as either normal or attack with the precise attack
category. After experimental trials, five features were selected out
of the 41 features in the NSL-KDD dataset, which have the most
impact and effect on algorithm learning performance. Presents
the selected five features with a brief description.
To evaluate the performance of NIDS in terms of accuracy (AC),
different metrics were used; precision (P), recall (R), and F-
measure (F). These metrics can be calculated using confusion
matrix parameters: true positive (the number of anomalous
instances that are correctly classified); false positive (the number
of normal instances that are incorrectly classified as anomalous);
true negative (the number of normal instances that are correctly
classified); and false negative (the number of anomalous
instances that are incorrectly classified as normal). A good NIDS
must achieve high DR and FAR. Accuracy (AC): This is the
percentage of correctly classified network activities. Precision (P):
The percentage of predicted anomalous instances that are actual
anomalous instances; the higher P, the lower FAR. . Recall (R):
the percentage of predicted attack instances versus all attack
instances presented. F-measure (F): measuring the performance
of NIDS using the harmonic mean of the P and R. We aim to
achieve a high F-score. We compare XGBoost against the other
two tree- based methods, RF and DT. Using the test set, which
includes the four types of attacks as discussed. Three different
evaluation metrics are computed; F-score, precision and recall.
XGBoost ranked first in the evaluation, with an F1-score of
95.55%, while RF and DT achieved 94.6% and 94.5%,
respectively. For precision, XGBoost outperformed RF and DT
with a score of 92%, while RF and DT scored 90% and 90.2%
respectively. Finally, for Recall, our proposed method with
24
XGBoost proves its stability with a score of 98% while for RF and
DT, the results were 82%, and 85%, respectively. From these
results, the proposed model with XGBoost performs with high
precision and high recall, which means that the classifier returns
accurate results and high precision, while, at the same time,
returning a majority of all positive results (it’s an attack and the
classifier detects that it’s an attack), which means high recall.
Finally, we evaluate the proposed method using an accuracy
analysis against seven classical machine learning algorithms, in
addition to the deep neural network. The proposed method
achieves an accuracy of 95.55%, while the second-best accuracy
performance is 82.02 for the NB tree, showing a significant
difference between the accuracy of our proposed method and the
other approaches. This evaluation confirms that the proposed
method is accurate and robust, even compared against other
algorithms. This shows how the unambiguous steps in our
approach are reliable, effective and authoritative. We conclude
that the proposed method achieves a verifiable result using
several techniques. For the precise literature and comparison, we
carefully chose the NSL-KDD data set, which is considered one
of the most powerful benchmark datasets. Several procedures of
data statistics, cleaning and verification are performed on the
dataset, which are very important in order to produce a smooth
learning process with no obstacles, such as over- or under-fitting
issues. This stage ensures that the proposed model has unified
data and increases the value of data, which helps in decision-
making. Feature normalization and selection clarifies the path for
clear selection and intelligent preferences, using only 5 features.
Subsequently, more detailed exploration and various
comparisons are carried out, based on three machine learning
algorithms, i.e., DT, RF, and XGBoost, in order to test their
performance with different criteria and then select the best
performing algorithm for our task. This shows that the selection is
dependably proven and technically verified.
25
NIDS in SDN-based machine learning algorithms has attracted
significant attention in the last two decades because of the
datasets and various algorithms proposed in machine learning,
using only limited features for better detection of anomalies better
and more efficient network security. In this study, the
benchmarking dataset NSL-KDD is used for training and testing.
Feature normalization, feature selection and data pre-processing
techniques are used in order to improve and optimize the
algorithm’s performance for accurate prediction, as well as to
facilitate a smooth training process with minimal time and
resources. To select the appropriate algorithm, we compare three
classical tree-based machine learning algorithms; Random
Forest, Decision Trees and XGBoost. We examine them using a
variety of evaluation metrics to find the disadvantages and
advantages of using one or more. Using six different evaluation
metrics, the proposed XGBoost model outperformed more than
seven algorithms used in NIDS. The proposed method focused
on detecting anomalies and protecting the SDN platform from
attacks in real-time scenarios. The proposed methods performed
two tasks simultaneously; to detect if there is an attack or not,
and to determine the type of attack (Dos, probe, U2R, R2L). In
future studies, more evaluation metrics will be carried out. We
plan to implement the approach using several deep neural
network algorithms, such as Auto-Encoder, Generative
Adversarial Networks, and Recurrent neural networks, such as
GRU and LSTM. These techniques have been proven in the
literature to allow convenient anomaly detection approaches in
NIDS applications. Also, we plan to compare these algorithms
against each other and integrate one or more neural network
architectures to extract more details of how we can implement an
efficient anomaly detection system in NIDS, with lower
consumption of time and resources. In addition, for a more solid
basis for comparison, several benchmarking cyber security
datasets, such as NSL-KDD, UNSWNB15, and CIC-IDS2017 will
be conducted, in order to make sure that the selection of the
26
proposed algorithm is not biased in any situation. These various
datasets are generated in different environments and conditions,
so more complex features will be available, more generalized
attacks will be covered and the accuracy of the proposed
algorithm will significantly increase, which could lead to a state-
of-the art approach.
35
2.4 Intrusion Preventing System using Intrusion Detection
System Decision Tree Data Mining
Problem statement: To distinguish the activities of the network
traffic that the intrusion and normal is very difficult and to need
much time consuming. An analyst must review all the data that
large and wide to find the sequence of intrusion on the network
connection. Therefore, it needs a way that can detect network
intrusion to reflect the current network traffics.
Approach: In this study, a novel method to find intrusion
characteristic for IDS using decision tree machine learning of
data mining technique was proposed. Method used to generate
of rules is classification by ID3 algorithm of decision tree.
Results: These rules can determine of intrusion characteristics
then to implement in the firewall policy rules as prevention.
Conclusion: Combination of IDS and firewall so-called the IPS, so
that besides detecting the existence of intrusion also can execute
by doing deny of intrusion as prevention.
With the global Internet connection, network security has gained
significant attention in research and industrial communities. Due
to the increasing threat of network attacks, firewalls have become
important elements of the security policy is generally. Firewall
can be allow or deny access network packet, but firewall cannot
detect intrusion or attack, so to need intrusion detection and then
implemented to firewall is access control systems as prevention.
Intrusion detection are also considered as a complementary
solution to firewall technology by recognizing attacks against the
network that are missed by the firewall. Firewall and IDS
represent an old stuff terminology in the field of IT security.
Firewall is good for protection a system and network and can
minimization risk of attack to network. IDS can detect existence
intrusion or attack. The joining ability of IDS and firewalls that is
so-called IPS. That is a functioning tool to detect intrusion and
then denying by firewall for prevention. For each type of network
traffics, there are one or more different rules. Every network
packet, which arrives at firewall, must be check against defined
36
rules until a matching rule found. The packet will be then allow or
banned access to the network, depending on the action specified
in the matching rule. Each rule identifies specific type of network
traffic. Characteristics to reflect the current of network traffics can
observe from network traffic logs as human pattern recognize.
This Study focus on some methods to prevention from attempt
intrusion to find intrusion characteristics in the network traffic as
IDS then implementation to firewall policy rules as prevention. To
find rules of intrusion characteristics using decision tree machine
learning data mining. Method used to generate of rules is
classification by ID3 algorithm of decision tree. It is an efficient
and optimized to make the rules filtering in firewall.
Log files:
Log files can give an idea about what the different parts of
system are doing. Logs can show what is going right and what is
going wrong. Log files can provide a useful profile activity. From a
security standpoint, it is crucial to be able to distinguish normal
activity from the activity of someone to attack server or network.
Log files are useful for three reasons:
Log files help with troubleshooting system problems and
understanding what is happening on the system
Logs serve as an early warning for both system and security
events
Logs can be indispensable in reconstructing events, whether
determined an intrusion has occurred and performing the follow-
up forensic investigation or just profiling normal activity
Decision tree is a technique in classification method of data
mining for learning patterns from data and using these patterns
for classification. Decision tree are structures used to classify and
data and with and common and attributes and as each decision
tree represents a rule, which categorizes data according to these
attributes.
Where each node (non-leaf node) denotes a test on an attribute,
each branch represent an outcome of the test and each leaf node
or terminal node holds a class label. The topmost node in a tree
is the root node. A decision tree classifier is one of the most
38
widely need supervised learning methods used for data
exploration. It is easy to interpret and can be represented as if-
then-else rules. This classifier works well on noisy data. A
decision tree aids in data exploration in the following manner:
It reduces a volume of data by transformation into a more
compact form that preserves the essential characteristics and
provides an accurate summary
It discovers whether the data contains well separated classes of
objects, such that the classes can be interpreted meaningfully in
the context of a substantive theory
It maps data in the form the leaves to its root. This may use to
predict the outcome for a new data or Query.
CHAPTER 3
EXPERIMENTAL METHODS AND ALGORITHMS
USED
3.1 MACHINE LEARNING SCOPE
Machine learning as a very likely approach to achieve human-
computer integration and can be applied in many computer fields.
Machine learning is not a typical method as it contains many
different computer algorithms. Yu Yang algorithms aim to solve
different machine learning tasks. At last, all the algorithms can
help the computer to act more like a human. Machine learning is
already applied in many fields, for instance, pattern recognition,
Artificial Intelligence, computer vision, data mining, and text
categorization and so on. Machine learning gives a new way to
40
develop the intelligence of the machines. It also becomes an
easier way to help people to analyses data from huge data sets.
A learning method is a complicated topic which has many
different kinds of forms. Everyone has different methods to study,
so does the machine. We can categorize various machine
learning systems by different conditions. In general, we can
separate learning problems in two main categories: supervised
learning and unsupervised learning.
3.1.1SUPERVISED LEARNING
Supervised learning is a commonly used machine learning
algorithm which appears in many different fields of computer
science. In the supervised learning method, the computer can
establish a learning model based on the training data set.
According to this learning model, a computer can use the
algorithm to predict or analyze new information. By using special
algorithms, a computer can find the best result and reduce the
error rate all by itself. Supervised learning is mainly used for two
different patterns: classification and regression.
In supervised learning, when a developer gives the computer
some samples, each sample is always attached with some
classification information. The computer will analyze these
samples to get learning experiences so that the error rate would
be reduced when a classifier does recognitions for each pattern.
41
correct or not. When the computer receives the original data, it
can find the potential regulation within the information
automatically and then the computer will adopt this regulation to
the new case. That makes the difference between supervised
learning and unsupervised learning. In some cases, this method
is more powerful than supervised learning. That is because there
is no need to do the classification for samples in advance.
Sometimes, our classification method may not be the best one.
On the other hand, a computer may find out the best method
after it learns it from samples again and again.
43
movie now because the votes for both the movies are somewhat
equal?
This is exactly what we call disorder-ness, there is an equal
number of votes for both the movies, and we can’t really decide
which movie we should watch. It would have been much easier if
the votes for “Lucy” were 8 and for “Titanic” it was 2. Here we
could easily say that the majority of votes are for “Lucy” hence
everyone will be watching this movie.
In a decision tree, the output is mostly “yes” or “no” The formula
for Entropy is shown below:
How do Decision Trees use Entropy?
Now we know what entropy is and what is its formula, next, we
need to know that how exactly it works in this algorithm.
Entropy basically measures the impurity of a node. Impurity is the
degree of randomness; it tells how random our data is. A pure
sub-split means that either you should be getting “yes”, or you
should be getting “no”.
Suppose feature 1 had 8 yes and 4 no, after the split feature 2
get 5 yes and 2 no whereas feature 3 gets 3 yes and 2 no.
We see here the split is not pure, why? Because we can still see
some negative classes in both the feature. In order to make a
decision tree, we need to calculate the impurity of each split, and
when the purity is 100% we make it as a leaf node. To check the
impurity of feature 2 and feature 3 we will take the help for
Entropy
We can clearly see from the tree itself that feature 2 has low
entropy or more purity than feature 3 since feature 2 has more
“yes” and it is easy to make a decision here.
Always remember that the higher the Entropy, the lower will be
the purity and the higher will be the impurity.
As mentioned earlier the goal of machine learning is to decrease
the uncertainty or impurity in the dataset, here by using the
entropy we are getting the impurity of a feature or a particular
node, we don’t know if the parent entropy or the entropy of a
particular node has decreased or not.
44
For this, we bring a new metric called “Information gain” which
tells us how much the parent entropy has decreased after
splitting it with some feature.
Information Gain:
Information gain measures the reduction of uncertainty given
some feature and it is also a deciding factor for which attribute
should be selected as a decision node or root node. It is just
entropy of the full dataset – entropy of the dataset given some
feature.
Let’s see how our decision tree will be made using these 2
features. We’ll use information gain to decide which feature
should be the root node and which feature should be placed after
the split.
When to stop splitting?
You must be asking this question to yourself that when do we
stop growing our tree? Usually, real-world datasets have a large
number of features, which will result in a large number of splits,
which in turn gives a huge tree. Such trees take time to build and
can lead to overfitting. That means the tree will give very good
accuracy on the training dataset but will give bad accuracy in test
data.
There are many ways to tackle this problem through hyper
parameter tuning. We can set the maximum depth of our
decision tree using the max_depth parameter. The more the
value of max_depth, the more complex your tree will be. The
training error will off-course decrease if we increase the
max_depth value but when our test data comes into the picture,
we will get a very bad accuracy. Hence you need a value that will
not over fit as well as under fit our data and for this, you can use
GridSearchCV.
Another way is to set the minimum number of samples for each
spilt. It is denoted by min_samples_split. Here we specify the
minimum number of samples required to do a spilt. For example,
we can use a minimum of 10 samples to reach a decision. That
means if a node has less than 10 samples then using this
45
parameter, we can stop the further splitting of this node and
make it a leaf node. There are more hyper-parameters such as:
min_samples_leaf – represents the minimum number of samples
required to be in the leaf node. The more you increase the
number, the more is the possibility of overfitting.
max_features – it helps us decide what number of features to
consider when looking for the best split. To read more about
these hyper parameters Pruning
It is another method that can help us avoid overfitting. It helps in
improving the performance of the tree by cutting the nodes or
sub-nodes which are not significant. There are mainly 2 ways for
pruning:
Pre-pruning – we can stop growing the tree earlier, which means
we can prune/remove/cut a node if it has low importance while
growing the tree.
Post-pruning – once our tree is built to its depth, we can start
pruning the nodes based on their significance.
Endnotes
To summarize, in this article we learned about decision trees. On
what basis the tree splits the nodes and how to can stop
overfitting. Why linear regression doesn’t work in the case of
classification problems. In the next article, I will explain Random
forests, which is again a new technique to avoid overfitting.
46
Fig.3.2.1 Decision Tree
47
Fig.3.3.1 Genetic Algorithm Steps
Steps Involved in Genetic Algorithm:
Initialisation
Fitness Function
Selection
Crossover
Mutation
Application of Genetic Algorithm: Feature Selection
Every time you participate in a data science competition, how do
you select features that are important in prediction of the target
variable? You always look at the feature importance of some
model, and then manually decide the threshold, and select the
features which have importance above that threshold.
48
Is there any better way to deal with this kind of situations?
Actually one of the most advanced algorithms for feature
selection is genetic algorithm.
The method here is completely same as the one we did with the
knapsack problem.
We will again start with the population of chromosome, where
each chromosome will be binary string. 1 will denote “inclusion”
of feature in model and 0 will denote “exclusion” of feature in the
model.
And another difference would be that the fitness function would
be changed. The fitness function here will be our accuracy metric
of the competition. The more accurate our set of chromosome in
predicting value, the more fit it will be.
I suppose, you would now be thinking is there any use of such
tough tasks. I will not answer this question now, rather let us look
at the implementation of it using TPOT library and then you
decide this.
Implementation using TPOT library
First, let’s take a quick view on the TPOT (Tree-based Pipeline
Optimisation Technique) which is built upon sickie-learn library.
A basic pipeline structure is shown in the image below.
49
Fig.3.3.2 Genetic Algorithm Application
50
Fig.3.4.2 Flow Diagram
51
CHAPTER 4
RESULTS, DISCUSSION AND PERFORMANCE
ANALYSIS
4.1 Requirements
Hardware requirements:
System: Pentium i3 Processor.
Hard Disk: 500 GB.
Monitor : 15’’ LED
Input Devices : Keyboard, Mouse
Ram : 2 GB
Software requirements:
4.2 MODULES
52
What is the machine learning Model?
The machine learning model is nothing but a piece of code; an
engineer or data scientist makes it smart through training with
data. So, if you give garbage to the model, you will get garbage in
Data Collection
53
The KDD data set is a well-known benchmark in the research
of Intrusion Detection techniques. A lot of work is going on for the
improvement of intrusion detection strategies while the research
on the data used for training and testing the detection model is
equally of prime concern because better data quality can improve
offline intrusion detection. This paper presents the analysis of
KDD data set with respect to four classes which are Basic,
Data Pre-Processing
Data pre-processing is a process of cleaning the raw data i.e. the
data is collected in the real world and is converted to a clean data
set. In other words, whenever the data is gathered from different
sources it is collected in a raw format and this data isn’t feasible
for the analysis. Therefore, certain steps are executed to convert
the data into a small clean data set, this part of the process is
called as data pre-processing.
Feature Extraction
This is done to reduce the number of attributes in the dataset
hence providing advantages like speeding up the training and
accuracy improvements.
54
Model training
A training model is a dataset that is used to train an ML
algorithm. It consists of the sample output data and the
corresponding sets of input data that have an influence on the
output.
Training set:
The training set is the material through which the computer learns
how to process information. Machine learning uses algorithms to
perform the training part. A set of data used for learning that is to
fit the parameters of the classifier.
Validation set:
Cross-validation is primarily used in applied machine learning to
estimate the skill of a machine learning model on unseen data. A
set of unseen data is used from the training data to tune the
parameters of a classifier.
Fig.4.2.4 Validation
Once the data is divided into the 3 given segments we can start
the training process.
55
In a data set, a training set is implemented to build up a model,
while a test (or validation) set is to validate the model built. Data
points in the training set are excluded from the test (validation)
set. Usually, a data set is divided into a training set, a validation
set (some people use ‘test set’ instead) in each iteration, or
divided into a training set, a validation set and a test set in each
iteration. The model uses any one of the models that we had
chosen in step 3/ point 3. Once the model is trained we can use
the same trained model to predict using the testing data i.e. the
unseen data. Once this is done we can develop a confusion
matrix, this tells us how well our model is trained. A confusion
matrix has 4 parameters, which are ‘True positives’, ‘True
Negatives’, ‘False Positives’ and ‘False Negative’. We prefer that
we get more values in the True negatives and true positives to
get a more accurate model. The size of the Confusion matrix
completely depends upon the number of classes.
57
4.3 RESULT
58
Fig.4.3.2 Train Data Set
59
Fig.4.3.4 All Attacks Plot graph
60
Fig.4.3.6Normal Attacks Plot Graph
4.4 Code
Dataset.py
import pyshark
import time
import random
class Packet:
packet_list = list()
def initiating_packets(self):
self.packet_list.clear()
capture = pyshark.LiveCapture(interface="Wi-Fi")
for packet in capture.sniff_continuously(packet_count=25):
try:
if "<UDP Layer>" in str(packet.layers) and "<IP Layer>"
in str(packet.layers):
self.packet_list.append(packet)
elif "<TCP Layer>" in str(packet.layers) and "<IP
Layer>" in str(packet.layers):
self.packet_list.append(packet)
61
except:
print(f"No Attribute name 'ip' {packet.layers}")
def udp_packet_attributes(self,packet):
attr_list = list()
a1 = packet.ip.ttl
a2 = packet.ip.proto
a3 = self.__get_service(packet.udp.port, packet.udp.dstport)
a4 = packet.ip.len
a5 = random.randrange(0,1000)
a6 = self.__get_land(packet,a2)
a7 = 0
a8, a10, a11 =
self.__get_count_with_same_and_diff_service_rate(packet.udp.d
stport, a3) #23, 29, 30
a9, a12 =
self.__get_srv_count_and_srv_diff_host_rate(packet.ip.dst, a3)
#24, 31
a13, a15, a16 = self.__get_dst_host_count(packet.ip.dst,
a3) # 32,34,35
a14, a17, a18 =
self.__get_dst_host_srv_count(packet.udp.port,
packet.udp.dstport, packet.ip.dst) #33, 36, 37
attr_list.extend((a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a1
4,a15,a16,a17,a18))
return self.get_all_float(attr_list)
def tcp_packet_attributes(self,packet):
attr_list = list()
a1 = packet.ip.ttl #duration
a2 = packet.ip.proto #protocol
a3 = self.__get_service(packet.tcp.port, packet.tcp.dstport)
# service
a4 = packet.ip.len
a5 = random.randrange(0,1000)
62
a6 = self.__get_land(packet,a2)
a7 = packet.tcp.urgent_pointer
a8, a10, a11 =
self.__get_count_with_same_and_diff_service_rate(packet.tcp.ds
tport, a3) #23, 29, 30
a9, a12 =
self.__get_srv_count_and_srv_diff_host_rate(packet.ip.dst, a3)
#24, 31
a13, a15, a16 = self.__get_dst_host_count(packet.ip.dst,
a3) # 32,34,35
a14, a17, a18 =
self.__get_dst_host_srv_count(packet.tcp.port,
packet.tcp.dstport, packet.ip.dst) #33, 36, 37
attr_list.extend((a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a1
4,a15,a16,a17,a18))
return self.get_all_float(attr_list)
def __get_service(self,src_port,dst_port):
services = [80,443,53]
if int(src_port) in services:
return int(src_port)
elif int(dst_port) in services:
return int(dst_port)
else:
return 53
def
__get_count_with_same_and_diff_service_rate(self,dst_port,
service): #23, 29, 30
count = 0
packet_with_same_service = 0
for p in self.packet_list:
if "<UDP Layer>" in str(p.layers):
if (p.udp.dstport == dst_port):
count+=1
if (self.__get_service(p.udp.port, p.udp.dstport) ==
service):
packet_with_same_service+=1
elif "<TCP Layer>" in str(p.layers):
if (p.tcp.dstport == dst_port):
count+=1
if (self.__get_service(p.tcp.port, p.tcp.dstport) ==
service):
packet_with_same_service+=1
same_service_rate=0.0
diff_service_rate = 1.0
if not count==0: # To avoid zero divison error
same_service_rate =
((packet_with_same_service*100)/count)/100
diff_service_rate = diff_service_rate-same_service_rate
return (count, same_service_rate, diff_service_rate)
def __get_srv_count_and_srv_diff_host_rate(self,dst_ip,
service): #24, 31
64
diff_dst_ip = 0
service_count = 0
for p in self.packet_list:
if "<UDP Layer>" in str(p.layers):
if (self.__get_service(p.udp.port, p.udp.dstport) ==
service):
service_count+=1
if not (p.ip.dst == dst_ip): # not added
diff_dst_ip+=1
elif "<TCP Layer>" in str(p.layers):
if (self.__get_service(p.tcp.port, p.tcp.dstport) ==
service):
service_count+=1
if not (p.ip.dst == dst_ip): # not added
diff_dst_ip+=1
srv_diff_host_rate = 0.0
if not(service_count == 0):
srv_diff_host_rate = ((diff_dst_ip*100)/service_count)/100
return (service_count, srv_diff_host_rate)
def get_all_float(self,l):
all_float = list()
for x in l:
all_float.append(round(float(x),1))
return all_float
GAAlgorithm.py
import Population
import random
class GAAlgorithm():
def initialization(self):
self.population.initialize_population()
67
def calculate_fitness(self):
self.population.calculate_fitness()
def selection(self):
parents = list()
end = int(self.population_size/2)
no_of_parents = int(self.population_size/2)
for x in range(no_of_parents):
p1 = random.randint(0,end-1)
p2 = random.randint(end,self.population_size-1)
parents.append([p1,p2])
return parents
def cross_over(self,parents):
self.population.cross_over(parents)
def mutation(self):
self.population.mutation(self.mutation_rate)
def clear_population(self):
self.population.clear_population()
Individual.py
import random
import string
import pandas
from classifier import DecisionTree
class Individual:
chromosome = list()
fitness = 0
def __init__(self, train_dataset, test_dataset, gene_length=18):
68
self.gene_length=int(gene_length)
self.chromosome = [random.randint(0,1) for x in
range(self.gene_length)]
self.train_dataset = train_dataset
self.test_dataset = test_dataset
self.gene_length = gene_length
def calculate_fitness(self):
header = list(string.ascii_lowercase[0:(self.gene_length+1)])
kdd_train = pandas.read_csv(self.train_dataset,
names=header)
kdd_test = pandas.read_csv(self.test_dataset,
names=header)
selected_index= [header[x] for x, y in
enumerate(self.chromosome) if y==1]
var_train, res_train = kdd_train[selected_index],
kdd_train[header[18]]
var_test, res_test = kdd_test[selected_index],
kdd_test[header[18]]
self.fitness = self.__get_fitness(var_train, res_train, var_test,
res_test)*100
Packet.py
import pyshark
import random
class Packet:
packet_list = list() #list is declare
def initiating_packets(self):
self.packet_list.clear()
69
capture = pyshark.LiveCapture(interface="Wi-Fi")
for packet in capture.sniff_continuously(packet_count=25):
try:
if "<UDP Layer>" in str(packet.layers) and "<IP Layer>"
in str(packet.layers):
self.packet_list.append(packet)
elif "<TCP Layer>" in str(packet.layers) and "<IP
Layer>" in str(packet.layers):
self.packet_list.append(packet)
except:
print(f"No Attribute name 'ip' {packet.layers}")
def udp_packet_attributes(self,packet):
attr_list = list()
a1 = packet.ip.ttl
a2 = packet.ip.proto
a3 = self.__get_service(packet.udp.port, packet.udp.dstport)
a4 = packet.ip.len
a5 = random.randrange(0,1000)
a6 = self.__get_land(packet,a2)
a7 = 0 # urgent pointer not exist in udp layer
a8, a10, a11 =
self.__get_count_with_same_and_diff_service_rate(packet.udp.d
stport, a3) #23, 29, 30
a9, a12 =
self.__get_srv_count_and_srv_diff_host_rate(packet.ip.dst, a3)
#24, 31
a13, a15, a16 = self.__get_dst_host_count(packet.ip.dst,
a3) # 32,34,35
a14, a17, a18 =
self.__get_dst_host_srv_count(packet.udp.port,
packet.udp.dstport, packet.ip.dst) #33, 36, 37
attr_list.extend((a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a1
4,a15,a16,a17,a18))
return self.get_all_float(attr_list)
70
def tcp_packet_attributes(self,packet):
attr_list = list()
a1 = packet.ip.ttl #duration
a2 = packet.ip.proto #protocol
a3 = self.__get_service(packet.tcp.port, packet.tcp.dstport)
# service
a4 = packet.ip.len #Src - byte
a5 = random.randrange(0,1000) #dest_byte
a6 = self.__get_land(packet,a2) #land
a7 = packet.tcp.urgent_pointer #urgentpoint
a8, a10, a11 =
self.__get_count_with_same_and_diff_service_rate(packet.tcp.ds
tport, a3) #23, 29, 30
a9, a12 =
self.__get_srv_count_and_srv_diff_host_rate(packet.ip.dst, a3)
#24, 31
a13, a15, a16 = self.__get_dst_host_count(packet.ip.dst,
a3) # 32,34,35
a14, a17, a18 =
self.__get_dst_host_srv_count(packet.tcp.port,
packet.tcp.dstport, packet.ip.dst) #33, 36, 37
attr_list.extend((a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a1
4,a15,a16,a17,a18))
return self.get_all_float(attr_list) # convert every attribute
to float data type
def __get_service(self,src_port,dst_port):
services = [80,443,53]
if int(src_port) in services:
return int(src_port)
elif int(dst_port) in services:
return int(dst_port)
else:
71
return 53
def
__get_count_with_same_and_diff_service_rate(self,dst_port,
service): #23, 29, 30
count = 0
packet_with_same_service = 0
for p in self.packet_list:
if "<UDP Layer>" in str(p.layers):
if (p.udp.dstport == dst_port): #same
destination port
count+=1
if (self.__get_service(p.udp.port, p.udp.dstport) ==
service): # same service
packet_with_same_service+=1
elif "<TCP Layer>" in str(p.layers):
if (p.tcp.dstport == dst_port):
count+=1
if (self.__get_service(p.tcp.port, p.tcp.dstport) ==
service):
72
packet_with_same_service+=1
same_service_rate=0.0
diff_service_rate = 1.0
if not count==0:
same_service_rate =
((packet_with_same_service*100)/count)/100
diff_service_rate = diff_service_rate-same_service_rate
return (count, same_service_rate, diff_service_rate)
def __get_srv_count_and_srv_diff_host_rate(self,dst_ip,
service): #24, 31
diff_dst_ip = 0
service_count = 0
for p in self.packet_list:
if "<UDP Layer>" in str(p.layers):
if (self.__get_service(p.udp.port, p.udp.dstport) ==
service): # same service
service_count+=1
if not (p.ip.dst == dst_ip): # different
destination ip if udp
diff_dst_ip+=1
elif "<TCP Layer>" in str(p.layers):
if (self.__get_service(p.tcp.port, p.tcp.dstport) ==
service):
service_count+=1
if not (p.ip.dst == dst_ip): # # different
destination ip if tcp
diff_dst_ip+=1
srv_diff_host_rate = 0.0
if not(service_count == 0):
srv_diff_host_rate = ((diff_dst_ip*100)/service_count)/100
return (service_count, srv_diff_host_rate)
def get_all_float(self,l):
all_float = list()
for x in l:
all_float.append(round(float(x),1))
return all_float
75
ABNIDS.py
# Change testing panel to avoid segmentation fault
from PyQt5 import QtCore, QtGui, QtWidgets
from PyQt5.QtGui import QIcon, QPixmap
from PyQt5.QtWidgets import
qApp,QFileDialog,QMessageBox,QMainWindow,QDialog,QDialogButtonBox,QV
BoxLayout, QHeaderView, QMessageBox
import os
import time
import pyshark
import matplotlib.pyplot as plt
import threading
import packet as pack
import GAAlgorithm
import Preprocess as data
import classifier
class Ui_MainWindow(object):
def __init__(self):
self.tree_classifier = classifier.DecisionTree()
self.packet = pack.Packet()
self.trained = False
self.stop = False
self.threadActive = False
self.pause = False
def plot_graph(self):
x = ['Normal','DoS','Prob']
normal,dos,prob = self.tree_classifier.get_class_count()
y = [normal,dos,prob]
plt.bar(x,y,width=0.3,label="BARCHART")
plt.xlabel('Classes')
plt.ylabel('Count')
plt.title('Graph Plotting')
plt.legend()
76
plt.show()
def train_model(self):
try:
train_dataset, train_dataset_type =
QFileDialog.getOpenFileName(MainWindow, "Select Training Dataset","","All
Files (*);;CSV Files (*.csv)")
if train_dataset:
os.chdir(os.path.dirname(train_dataset))
test_dataset, test_dataset_type =
QFileDialog.getOpenFileName(MainWindow, "Select Testing Dataset","","All
Files (*);;CSV Files (*.csv)")
if train_dataset and test_dataset:
generation = 0
train_dataset = data.Dataset.refine_dataset(train_dataset, "Train
Preprocess.txt")
except:
try:
ga.clear_population()
except:
print("Err 00")
finally:
self.showdialog('Model train','Model trained unsuccessfully',2)
def static_testing(self):
if self.isModelTrained():
if (self.threadActive):
self.showdialog('Warning','Please stop currently testing',3)
else:
test_dataset, train_dataset_type =
QFileDialog.getOpenFileName(MainWindow, "Select Testing Dataset","","All
Files (*);;CSV Files (*.csv)")
if test_dataset:
try:
test_dataset = data.Dataset.refine_dataset(test_dataset, "Test
Dataset.txt")
t1 = threading.Thread(target=self.static_testing_thread, name =
'Static testing', args=(test_dataset,))
t1.start()
self.threadActive = True
except:
self.showdialog('Error','Invalid Dataset',2)
else:
self.showdialog('Warning','Model not trained',3)
78
def static_testing_thread(self,dataset):
row = 0
self.reset_all_content()
with open(dataset,"r") as file:
for line in file.readlines():
try:
line = line.split(',')
result, result_type = self.tree_classifier.test_dataset(line)
self.insert_data(line,result,result_type,row)
row+=1
if self.pause:
while(self.pause):
pass
if self.isStop():
self.stop=False
break
time.sleep(0.05)
except:
print("Err")
self.threadActive = False
def realtime_testing(self):
if self.isModelTrained():
if (self.threadActive):
self.showdialog('Warning','Please stop currently testing',3)
else:
t2 = threading.Thread(target=self.realtime_testing_thread, name =
'Realtime testing')
t2.start()
self.threadActive = True
else:
self.showdialog('Warning','Model not trained',3)
79
def realtime_testing_thread(self):
self.reset_all_content()
self.packet.initiating_packets()
t1 = time.time()
attr_list = list()
capture = pyshark.LiveCapture(interface='Wi-Fi')
row = 0
try:
for p in capture.sniff_continuously():
try:
if "<UDP Layer>" in str(p.layers) and "<IP Layer>" in str(p.layers):
attr_list = self.packet.udp_packet_attributes(p)
result, result_type = self.tree_classifier.test_dataset(attr_list)
self.insert_data(attr_list,result,result_type,row)
print(attr_list)
row+=1
elif "<TCP Layer>" in str(p.layers) and "<IP Layer>" in str(p.layers):
attr_list = self.packet.tcp_packet_attributes(p)
result, result_type = self.tree_classifier.test_dataset(attr_list)
self.insert_data(attr_list,result,result_type,row)
print(attr_list)
row+=1
if (time.time()-t1) > 5 and not self.isStop: # 5Seconds
print("Updateing List")
self.packet.initiating_packets()
t1 = time.time()
if self.pause:
while(self.pause):
pass
if self.isStop():
self.stop=False
break
except :
print("Err")
except :
80
print("Error in loooooop")
def pause_resume(self):
if self.pause:
self.pause = False
self.btn_start.setText("Pause")
else:
self.pause = True
self.btn_start.setText("Resume")
def save_log_file(self):
log = self.tree_classifier.get_log()
url = QFileDialog.getSaveFileName(None, 'Save Log', 'untitled', "Text file
(*.txt);;All Files (*)")
if url[0]:
try:
name = url[1]
url = url[0]
with open(url, 'w') as file:
file.write(log)
self.showdialog('Saved',f'File saved as {url}',1)
except:
self.showdialog('Error','File not saved',2)
def stop_capturing_testing(self):
if self.pause:
self.pause = False
self.btn_start.setText('Pause')
if not self.stop:
self.stop = True
if self.threadActive:
self.threadActive = False
def reset_all_content(self):
if self.pause:
81
self.pause = False
self.btn_start.setText('Pause')
self.stop=False
self.tree_classifier.reset_class_count()
self.panel_capturing.clearContents()
self.panel_capturing.setRowCount(0)
self.panel_result.clearContents()
self.panel_result.setRowCount(0)
self.panel_testing.clear()
def insert_data(self,line,result,result_type,row):
self.panel_capturing.insertRow(row)
for column, item in enumerate(line[0:4:1]):
self.panel_capturing.setItem(row,column,QtWidgets.QTableWidgetItem(str(item))
)
self.panel_capturing.scrollToBottom()
self.panel_testing.clear()
self.panel_testing.addItem(str(line[0:4:1]))
if not result==0:
result_row = self.panel_result.rowCount()
self.panel_result.insertRow(result_row)
x = [row+1, line[1], line[2], result_type]
for column, item in enumerate(x):
self.panel_result.setItem(result_row,column,QtWidgets.QTableWidgetItem(str(ite
m)))
self.panel_result.scrollToBottom()
def clickexit(self):
buttonReply = QMessageBox.question(MainWindow, 'Exit', "Are ou sure to
exit?", QMessageBox.Yes | QMessageBox.No, QMessageBox.No)
if buttonReply == QMessageBox.Yes:
if self.threadActive:
82
self.pause = False
self.stop = True
qApp.quit()
else:
print('No clicked.')
def isStop(self):
return self.stop
def showdialog(self,title,text, icon_type):
msg = QMessageBox()
if icon_type==1:
msg.setIcon(QMessageBox.Information)
elif icon_type==2:
msg.setIcon(QMessageBox.Critical)
elif icon_type==3:
msg.setIcon(QMessageBox.Warning)
msg.setText(text)
msg.setWindowTitle(title)
msg.setStandardButtons(QMessageBox.Ok)
msg.buttonClicked.connect(self.msgbtn)
retval = msg.exec_()
def msgbtn(self):
self.progressBar.setProperty("value", 0)
def isModelTrained(self):
return self.trained
def setupUi(self, MainWindow):
MainWindow.setObjectName("MainWindow")
path = os.path.dirname(os.path.abspath(__file__))
MainWindow.setWindowIcon(QtGui.QIcon(os.path.join(path,'icon.png')))
MainWindow.resize(908, 844)
sizePolicy = QtWidgets.QSizePolicy(QtWidgets.QSizePolicy.Fixed,
QtWidgets.QSizePolicy.Preferred)
sizePolicy.setHorizontalStretch(0)
sizePolicy.setVerticalStretch(0)
83
sizePolicy.setHeightForWidth(MainWindow.sizePolicy().hasHeightForWidth())
MainWindow.setSizePolicy(sizePolicy)
MainWindow.setIconSize(QtCore.QSize(30, 30))
self.centralwidget = QtWidgets.QWidget(MainWindow)
self.centralwidget.setObjectName("centralwidget")
self.gridLayout = QtWidgets.QGridLayout(self.centralwidget)
self.gridLayout.setObjectName("gridLayout")
spacerItem = QtWidgets.QSpacerItem(10, 10,
QtWidgets.QSizePolicy.Expanding, QtWidgets.QSizePolicy.Minimum)
self.gridLayout.addItem(spacerItem, 1, 0, 1, 1)
spacerItem1 = QtWidgets.QSpacerItem(20, 20,
QtWidgets.QSizePolicy.Minimum, QtWidgets.QSizePolicy.Maximum)
self.gridLayout.addItem(spacerItem1, 4, 1, 1, 1)
spacerItem2 = QtWidgets.QSpacerItem(20, 10,
QtWidgets.QSizePolicy.Minimum, QtWidgets.QSizePolicy.Fixed)
self.gridLayout.addItem(spacerItem2, 6, 1, 1, 1)
self.horizontalLayout_2 = QtWidgets.QHBoxLayout()
self.horizontalLayout_2.setObjectName("horizontalLayout_2")
spacerItem3 = QtWidgets.QSpacerItem(15, 10,
QtWidgets.QSizePolicy.Ignored, QtWidgets.QSizePolicy.Minimum)
self.horizontalLayout_2.addItem(spacerItem3)
self.btn_start = QtWidgets.QPushButton(self.centralwidget)
self.btn_start.setObjectName("btn_start")
self.btn_start.setText('Pause')
self.btn_start.clicked.connect(self.pause_resume)
self.horizontalLayout_2.addWidget(self.btn_start)
# ####################################################
self.btn_pause = tWidgets.QPushButton(self.centralwidget)
self.btn_pause.setText("Stop Capturing/Testing")
self.btn_pause.setObjectName("btn_pause")
self.btn_pause.clicked.connect(self.stop_capturing_testing)
84
self.horizontalLayout_2.addWidget(self.btn_pause)
self.gridLayout.addLayout(self.horizontalLayout_2, 8, 1, 1, 1)
self.horizontalLayout = QtWidgets.QHBoxLayout()
self.horizontalLayout.setObjectName("horizontalLayout")
# #####################################################
self.btn_modeltrain = QtWidgets.QPushButton(self.centralwidget)
self.btn_modeltrain.setText("Train Model")
self.btn_modeltrain.setObjectName("btn_modeltrain")
self.btn_modeltrain.clicked.connect(self.train_model)
self.horizontalLayout.addWidget(self.btn_modeltrain)
# ######################################################
self.btn_statictesting = QtWidgets.QPushButton(self.centralwidget)
self.btn_statictesting.setText("Static Testing")
self.btn_statictesting.setObjectName("btn_statictesting")
self.btn_statictesting.clicked.connect(self.static_testing)
self.horizontalLayout.addWidget(self.btn_statictesting)
# ######################################################
self.btn_realtimetesting = QtWidgets.QPushButton(self.centralwidget)
self.btn_realtimetesting.setText("")
self.btn_realtimetesting.setObjectName("")
self.btn_realtimetesting.clicked.connect(self.realtime_testing)
self.horizontalLayout.addWidget(self.btn_realtimetesting)
# ######################################################
self.btn_savelog = QtWidgets.QPushButton(self.centralwidget)
self.btn_savelog.setText("Save Log")
icon5 = QtGui.QIcon()
self.btn_savelog.setObjectName("btn_savelog")
self.btn_savelog.clicked.connect(self.save_log_file)
self.horizontalLayout.addWidget(self.btn_savelog)
85
# ######################################################
self.btn_graph = QtWidgets.QPushButton(self.centralwidget)
self.btn_graph.setText("Plot Graph")
self.btn_graph.setObjectName("btn_graph")
self.btn_graph.clicked.connect(self.plot_graph)
self.horizontalLayout.addWidget(self.btn_graph)
# ######################################################
self.btn_exit = QtWidgets.QPushButton(self.centralwidget)
self.btn_exit.setText("Exit")
self.btn_exit.setObjectName("btn_exit")
self.btn_exit.clicked.connect(self.clickexit)
self.horizontalLayout.addWidget(self.btn_exit)
# ######################################################
self.gridLayout.addLayout(self.horizontalLayout, 3, 1, 1, 2)
spacerItem4 = QtWidgets.QSpacerItem(20, 10,
QtWidgets.QSizePolicy.Minimum, QtWidgets.QSizePolicy.Fixed)
self.gridLayout.addItem(spacerItem4, 8, 1, 1, 1)
spacerItem5 = QtWidgets.QSpacerItem(20, 10,
QtWidgets.QSizePolicy.Minimum, QtWidgets.QSizePolicy.Fixed)
self.gridLayout.addItem(spacerItem5, 0, 1, 1, 1)
self.panel_capturing = QtWidgets.QTableWidget(self.centralwidget)
sizePolicy = QtWidgets.QSizePolicy(QtWidgets.QSizePolicy.Preferred,
QtWidgets.QSizePolicy.Preferred)
sizePolicy.setHorizontalStretch(10)
sizePolicy.setVerticalStretch(0)
sizePolicy.setHeightForWidth(self.panel_capturing.sizePolicy().hasHeightForWidt
h())
self.panel_capturing.setSizePolicy(sizePolicy)
self.panel_capturing.setRowCount(0)
self.panel_capturing.setColumnCount(4)
86
self.panel_capturing.setObjectName("panel_capturing")
item = QtWidgets.QTableWidgetItem()
self.panel_capturing.setHorizontalHeaderItem(0, item)
item = QtWidgets.QTableWidgetItem()
self.panel_capturing.setHorizontalHeaderItem(1, item)
item = QtWidgets.QTableWidgetItem()
self.panel_capturing.setHorizontalHeaderItem(2, item)
item = QtWidgets.QTableWidgetItem()
self.panel_capturing.setHorizontalHeaderItem(3, item)
self.gridLayout.addWidget(self.panel_capturing, 4, 1, 4, 1)
self.label = QtWidgets.QLabel(self.centralwidget)
sizePolicy = QtWidgets.QSizePolicy(QtWidgets.QSizePolicy.Fixed,
QtWidgets.QSizePolicy.Fixed)
sizePolicy.setHorizontalStretch(0)
sizePolicy.setVerticalStretch(0)
sizePolicy.setHeightForWidth(self.label.sizePolicy().hasHeightForWidth())
self.label.setSizePolicy(sizePolicy)
self.label.setLayoutDirection(QtCore.Qt.LeftToRight)
self.label.setAutoFillBackground(False)
self.label.setText("")
path = os.path.dirname(os.path.abspath(__file__))
path = path + r'\icons'
self.label.setPixmap(QtGui.QPixmap(os.path.join(path,'logo.jpg')))
self.label.setScaledContents(True)
self.label.setAlignment(QtCore.Qt.AlignCenter)
self.label.setObjectName("label")
self.gridLayout.addWidget(self.label, 1, 1, 1, 1)
spacerItem6 = QtWidgets.QSpacerItem(10, 20,
QtWidgets.QSizePolicy.Minimum, QtWidgets.QSizePolicy.Fixed)
self.gridLayout.addItem(spacerItem6, 2, 1, 1, 1)
self.panel_testing = QtWidgets.QListWidget(self.centralwidget)
sizePolicy = QtWidgets.QSizePolicy(QtWidgets.QSizePolicy.Expanding,
QtWidgets.QSizePolicy.Preferred)
sizePolicy.setHorizontalStretch(0)
sizePolicy.setVerticalStretch(0)
87
sizePolicy.setHeightForWidth(self.panel_testing.sizePolicy().hasHeightForWidth(
))
self.panel_testing.setSizePolicy(sizePolicy)
self.panel_testing.setVerticalScrollBarPolicy(QtCore.Qt.ScrollBarAlwaysOn)
self.panel_testing.setHorizontalScrollBarPolicy(QtCore.Qt.ScrollBarAsNeeded)
self.panel_testing.setObjectName("panel_testing")
self.gridLayout.addWidget(self.panel_testing, 9, 1, 1, 1)
self.progressBar = QtWidgets.QProgressBar(self.centralwidget)
self.progressBar.setProperty("value", 0)
self.progressBar.setObjectName("progressBar")
self.gridLayout.addWidget(self.progressBar, 10, 1, 1, 2)
# ----------------------------------------------------------------- #
self.panel_result = QtWidgets.QTableWidget(self.centralwidget)
sizePolicy = QtWidgets.QSizePolicy(QtWidgets.QSizePolicy.Preferred,
QtWidgets.QSizePolicy.Preferred)
sizePolicy.setHorizontalStretch(10)
sizePolicy.setVerticalStretch(0)
sizePolicy.setHeightForWidth(self.panel_result.sizePolicy().hasHeightForWidth())
self.panel_result.setSizePolicy(sizePolicy)
self.panel_result.setRowCount(0)
self.panel_result.setColumnCount(4)
self.panel_result.setObjectName("panel_result")
item = QtWidgets.QTableWidgetItem()
self.panel_result.setHorizontalHeaderItem(0, item)
item = QtWidgets.QTableWidgetItem()
self.panel_result.setHorizontalHeaderItem(1, item)
item = QtWidgets.QTableWidgetItem()
self.panel_result.setHorizontalHeaderItem(2, item)
item = QtWidgets.QTableWidgetItem()
self.panel_result.setHorizontalHeaderItem(3, item)
self.gridLayout.addWidget(self.panel_result, 4,2,6,1)
88
# ----------------------------------------------------------------- #
MainWindow.setCentralWidget(self.centralwidget)
self.menubar = QtWidgets.QMenuBar(MainWindow)
self.menubar.setGeometry(QtCore.QRect(0, 0, 908, 26))
self.menubar.setObjectName("menubar")
self.menuFile = QtWidgets.QMenu(self.menubar)
self.menuFile.setObjectName("menuFile")
self.menuAbout = QtWidgets.QMenu(self.menubar)
self.menuAbout.setObjectName("menuAbout")
MainWindow.setMenuBar(self.menubar)
self.statusbar = QtWidgets.QStatusBar(MainWindow)
self.statusbar.setObjectName("statusbar")
MainWindow.setStatusBar(self.statusbar)
self.actionNew = QtWidgets.QAction(MainWindow)
self.actionNew.setObjectName("actionNew")
self.actionOpen = QtWidgets.QAction(MainWindow)
self.actionOpen.setObjectName("actionOpen")
self.actionExit = QtWidgets.QAction(MainWindow)
self.actionExit.setObjectName("actionExit")
self.actionHelp = QtWidgets.QAction(MainWindow)
self.actionHelp.setObjectName("actionHelp")
self.menuFile.addAction(self.actionNew)
self.menuFile.addAction(self.actionOpen)
self.menuFile.addSeparator()
self.menuFile.addAction(self.actionExit)
self.actionExit.triggered.connect(qApp.quit)
self.menuAbout.addAction(self.actionHelp)
self.menubar.addAction(self.menuFile.menuAction())
self.menubar.addAction(self.menuAbout.menuAction())
self.retranslateUi(MainWindow)
QtCore.QMetaObject.connectSlotsByName(MainWindow)
if __name__ == "__main__":
import sys
app = QtWidgets.QApplication(sys.argv)
MainWindow = QtWidgets.QMainWindow()
ui = Ui_MainWindow()
ui.setupUi(MainWindow)
MainWindow.show()
sys.exit(app.exec_())
Conclusion:
Network traffic logs to describe patterns of behaviour in network
traffic accident with intrusive or normal activity. Decision tree
technique is good for the intrusion characteristic of the network
traffic logs for IDS and implemented in the genetic algorithm as
prevention.The other hand, this technique is also good efficiency
and optimize rule for the firewall rules such as avoid redundancy.
REFERENCES:
[1] R. M. A. Ujjan, Z. Pervez, K. Dahal, A. K. Bashir, R. Mumtaz,
and J. González, "Towards sFlow and adaptive polling sampling
for deep learning based DDoS detection in SDN," Future
Generation Computer Systems, vol. 111, pp. 763-779, 2020, doi:
10.1016/j.future.2019.10.015. [2]"Software Defined Networking
Definition." https://www.opennetworking.org/sdn-definition/
(accessed March, 2, 2020).
[2] S. Garg, K. Kaur, N. Kumar, and J. J. P. C. Rodrigues, "Hybrid
Deep-Learning-Based Anomaly Detection Scheme for Suspicious
Flow Detection in SDN: A Social Multimedia Perspective," IEEE
91
Transactions on Multimedia, vol. 21, no. 3, pp. 566-578, 2019,
doi: 10.1109/tmm.2019.2893549.
[3] M. Nobakht, V. Sivaraman, and R. Boreli, "A Host-Based
Intrusion Detection and Mitigation Framework for Smart Home
IoT Using OpenFlow," presented at the 2016 11th International
Conference on Availability, Reliability and Security (ARES), 2016.
[4] M. S. Elsayed, N. Le-Khac, S. Dev, and A. D. Jurcut,
"Machine Learning Techniques for Detecting Attacks in SDN," in
2019 IEEE 7th International Conference on Computer Science
and Network Technology (ICCSNT), 19-20 Oct. 2019
2019,pp,277-281,doi:
10.1109/ICCSNT47585.2019.8962519.
92
Publication
Submission
2 date: 02-Mar-2022 09:59AM (UTC-0600)
Submission ID: 1649798424
File name:
CTION_IN_SOFTWARE_DEFINED_NETWORK_USING_MACHINE_LEARNIN
G_1.docx (365.3K)
Word count: 2100
Character count: 11888
5
1
3
INTRUSION DETECTION IN SOFTWARE
DEFINED NETWORK
USING MACHINE LEARNING
ORIGINALITY REPORT
2
1 1
% %
% INT
ER ST
1%
NE U
SIMI T DE
LAR SO NT
ITY UR PUBLI PA
IND CE CATIO PE
EX S NS RS
PRIMARY SOURCES
2 Internet Source 1%
Peng Cui. "A
3
Tighter Analysis of
Set Cover Greedy
<1%
Algorithm for Test
Set", Lecture
Notes in Computer
Science, 2007
Publication
Bambang
Susilo, Riri Fitri Sari.
"Intrusion Detection
in Software Defined
Network
4
Using Deep Learning Approach", 2021
IEEE 11th Annual Computing and
Communication Workshop and
<1%
Conference (CCWC), 2021 Publication
thesai.org
5 Internet Source
< 1%
Exclude quotes On Exclude matches Off
Detection of Attacks (DoS, Probe) Using Genetic
Algorithm
ABSTARCT:
Input techniques can be partitioned into two kinds: misconstruing and deformity location.
A wide range of known (irresistible) assaults can be distinguished by evaluating the
normal interruption pace of the framework for checking the means of misconception. In
the case of something surprising occurs, the framework initially learns the ordinary
profile and afterward records every one of the components of the framework that don't
match the set up profile. The principle advantage of discovery is the maltreatment of the
capacity to identify new or surprising assaults at high rates, making it hard to distinguish.
The upside of having the option to identify uncommon things is the
capacity to recognize new (or startling) assaults that convey many
advantages. Procedures dependent on innovation pipelines utilized in
different ventures. We give general data to the investigation of traffic
data and for the location of street mishaps utilizing the significant
distance-course of-the-street
The proposed technique utilizes tests dependent on the issue of
eliminating traffic data via online media (Facebook and Twitter): this
movement gathers sentences connected with all traffic exercises, for
example, traffic stops or street terminations. The quantity of starting
handling strategies is presently executed. breathing, signal presentation,
POS signal, partition, and so forth to change the data acquired in the
inherent structure. The information is then consequently shown as
"traffic" or "traffic" utilizing the latent Dirichlet allocation (LDA)
calculation. Vehicle enrollment data is isolated into three kinds; great,
terrible and impartial. The response to this classification is the
expression enraptured (positive, negative, or unbiased) as for street
sentences, contingent upon whether or not it is traffic. The bag-of-words
(BoW) is presently used to change each sentence over to a solitary hot
code to take care of bi-directional LSTM organizations (Bi-LSTM). In
the wake of preparing, a multi-stage muscle network utilizes softmax to
arrange sentences as indicated by area, vehicle experience, and sort of
polarization. The proposed strategy contrasts the preparation of various
machines and the high-level preparing techniques as far as precision, F
scores, and different standards.
LITERATURE REVIEW
EXISTING SYSTEM:
Problem Statement:
Attacks are truly challenging, typical, and tedious to isolate street
exercises.
Utilizes Analysts need to think about enormous and wide-going data to
screen the seriousness of pipelines.
Technique The strategy used to recognize the pipelines is expected to
decide the progression of traffic.
Associating a firewall to an IDS, otherwise called an IDS, can
distinguish an assault, however can likewise keep it from assaulting.
Proposed System:
Advantages:
HARDWARE REQUIREMENTS:
SOFTWARE REQUIREMENTS:
BLOCK DIAGRAM:
FLOW DIAGRAM:
Decision tree
Introduction
Up until this point, we have figured out how to go this way and that, and
it has been hard to comprehend. Presently how about we start with
"Tree Decision", I guarantee you it very well may be a straightforward
calculation in Machine Learning. There aren't so numerous here. It is
one of the most broadly utilized and commonsense strategies for AI
since it is not difficult to utilize and clarify.
What is a Decision Tree?
It is an instrument with applications running in better places. The
testament tree can be utilized in similar class as obsolete issues. The
actual name recommends that it utilizes plans, for example, trees to
show prescience from the request in which things are isolated. It begins
at the root and finishes with the choice to get away. Before we study the
choice tree, how about we investigate a few words.
Root Nodes The top of this hub is toward the start of the choice tree,
and the public starts to isolate it as indicated by different elements.
Decision Nodes - The gatherings we see subsequent to isolating the
root are called Resolutions
Leaf Nodes - an indivisible head called a leaf or leaf
Sub-tree - 33% of the sub-tree plan, a large portion of the exactness of
the sub-tree.
Pruning - There is nothing to do except for remove the head to quit
trying too hard.
MODULES:
Dataset collection
Data Cleaning
Feature Extraction
Model training
Testing model
Performance Evaluation
Prediction
Dataset collection:
Informational index assortment:
Information assortment can assist you with tracking down ways of
following previous occasions utilizing information examination to
record them. This permits you to foresee the way and make prescient
models utilizing AI devices to anticipate future changes. Since the
prescient model is just pretty much as great as the data acquired, the
most effective way to gather information is to further develop
execution. The data ought to be faultless (garbage, open air squander)
and ought to incorporate data about the work you are doing. For
instance, a non-performing advance may not profit from the sum got,
yet may profit from gas costs over the long run. In this module, we
gather data from the kaggle data set. These figures contain data on
yearly contrasts.
Data cleaning:
Data cleanliness is a significant piece of all AI exercises. The data
cleanliness of this module is expected for the arrangement of
information for the annihilation and transformation of wrong,
inadequate, deluding or misdirecting data. You can utilize it to look for
data. Discover what cleaning you can do.
Feature Extraction:
Model training:
Performance Evaluation:
RESULT: