Explainable Final Doc 3
Explainable Final Doc 3
SYSTEM
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE INFORMATION TECHNOLOGY
Submitted by
NALLURI DHARANI 21491A0713
PINNIKAHEMALATHA 21491A0715
MAMILAPALLI PAVANKALYAN 21491A0743
GALLA RAVITEJA 21491A0752
SHAIK NISAR 22495A0702
“Task successful” makes everyone happy. But the happiness will be gold
without glitter if we didn’t state the persons who have supported us to make success.
We express our gratitude to the hon’ble Chairman Sri Dr. N. SRI GAYATRI
GARU, M.B.B.S., M.D.,QIS Group of Institutions, Ongole for his valuable
suggestions and advices in the B. Tech Course.
We would like to express our thankfulness to CSCDE & DPSR for their constant
motivation and valuable help throughout the project.
Finally, we would like to thanks our Parents, Family and Friends for their co-operation
to complete this Project.
Submitted by
ACKNOWLEDGEMENT III
DECLARATION IV
ABSTRACT V
CHAPTER
1 INTRODUCTION 3-4
1.1 Overview 3
1.2 Problem Statement 3
1.3 Objective
4
1.4 Scope of the Project
CHAPTER
2 LITERATURE SURVEY 5-8
2.1 Overview of Intrusion Detection 5
Systems
2.2 Explainable Artificial Intelligence (XAI) in 5
Security Systems
2.3 Signature-Based Intrusion Detection 6
Systems (SBIDS)
2.4 Explainability in SBIDS 7
2.5 Hybrid and Advanced Models of 8
Explainable SBIDS
CHAPTER
3 SYSTEM ANALYSIS 9-13
3.1 Existing Systems and Their Limitations 9
3.2 Proposed System 10-11
3.3 System Requirements 11-12
3.4 System Study 12-13
CHAPTER
4 SYSTEM DESIGN 14-23
4.1 SYSTEM ARCHITECTURE 14-15
4.2 DATA FLOW DIAGRAMS 16-17
4.3 UML DIAGRAM 18-21
4.4 IMPLEMENTATION MODULES 22-23
CHAPTER
5 SOFTWARE ENVIRONMENT 24-26
5.1 Operating System 24
5.2 Programming Languages 25
CHAPTER
CHAPTER
7 RESULTS AND DISCUSSION 31-49
7.1 CODE 31-37
7.2 INPUT & OUTPUT 38-46
7.3 RESULT 47-48
DISCUSSION
7.4 49
CHAPTER
8 FUTURE DEVELOPMENT AND 50-51
CONCLUSION
CHAPTER
9 APPENDIX 52-55
9.1 GLOSSARY OF TECHNICAL TERMS 52-53
9.2 REFERENCES TO RESEARCH PAPERS 54-55
USED IN THE PROJECT
LIST OF FIGURES
1|Page
LIST OF SYMBOLS ANDABBREVIATIONS
AI Artificial Intelligence
TTS Text-to-Speech
UI User Interface
2|Page
CHAPTER 1
INTRODUCTION
1.1 Overview
In today's hyper-connected world, securing digital assets, sensitive data, and networks is
critical. As cyber threats continue to grow in complexity and sophistication, organizations face the
daunting challenge of defending their infrastructure against a wide range of attacks, including
malware, phishing, denial of service, and more advanced persistent threats. One of the key
technologies used in cybersecurity to detect and prevent these attacks is the Intrusion Detection
System (IDS).
While SBIDS has been proven effective in identifying known attack vectors and detecting
malicious activities based on signature matching, it has certain limitations. The primary
shortcoming lies in its lack of transparency and explainability. Security analysts and network
administrators often receive alerts from SBIDS without sufficient insight into why the system
flagged certain activities as malicious. As a result, they are left with limited information, making
it challenging to verify the validity of alerts, reduce false positives, and understand the root cause
of the detection. Moreover, security teams need to quickly assess the nature and severity of the
threat to respond appropriately, which is often hindered by the lack of clear explanations.
To address these limitations, the concept of Explainability has been introduced to enhance
traditional SBIDS. An Explainable Signature-Based Intrusion Detection System (SBIDS) not
only detects threats but also provides detailed insights and justifications behind each detection. By
incorporating explainability, the system can articulate why an alert was triggered, what specific
signature was matched, and how it relates to a particular attack pattern. This transparency helps
security personnel to make faster and more informed decisions, improving their ability to mitigate
threats and respond in real-time.
3|Page
1.3 Objective
Explainability in IDSs also plays a vital role in increasing the overall trustworthiness of the
system. In many cases, organizations that rely heavily on automated security systems need to have
confidence that the decisions made by these systems are accurate and reliable. By making the
reasoning behind each detection clear and interpretable, Explainable SBIDS builds that trust,
empowering administrators to audit the system's performance, understand the underlying logic,
and ensure that the alerts are legitimate.
4|Page
CHAPTER 2
LITERATURE REVIEW
The literature on Intrusion Detection Systems (IDS) and Explainability in security
mechanisms is vast, reflecting the growing importance of detecting cyber threats efficiently and
transparently. This section provides an overview of key research contributions and existing work
in the fields of Signature-Based Intrusion Detection Systems (SBIDS) and explainable artificial
intelligence (XAI), particularly focusing on how explainability can be integrated into IDS to
enhance security operations.
Intrusion Detection Systems (IDS) have evolved significantly over the years, transitioning
from basic signature-based models to more sophisticated hybrid and explainable systems. The
integration of Explainable Artificial Intelligence (XAI) techniques has enhanced the transparency
and effectiveness of IDS, addressing the limitations of traditional methods.[1]
Between 1994 and 1999, foundational research laid the groundwork for IDS development.
Kumar & Spafford (1994) highlighted the efficiency of signature-based intrusion detection
(SBIDS) in identifying known threats. Around the same time, Denning (1987) introduced
anomaly-based detection, which complemented SBIDS by identifying deviations from normal
behavior.[1] Roesch (1999) developed Snort, a widely adopted open-source IDS,[2] while Paxson
(1999) introduced Bro IDS (now Zeek), a robust tool for network monitoring.[3]
From 2000 to 2007, researchers focused on the advancements and challenges in IDS.
Axelsson (2000) emphasized the limitation of SBIDS in detecting novel threats, and Mell,
Scarfone & Romanosky (2007) identified the increasing difficulty of detecting zero-day attacks.
Patcha & Park (2007) further explored anomaly detection, discussing its potential but also
highlighting its high false-positive rate.[4]
Between 2010 and 2016, the shift toward hybrid models and explainability in IDS began.
Sommer & Paxson (2010) documented the persistent issue of false positives in IDS, prompting the
development of improved detection mechanisms. Ahmed et al. (2016) proposed hybrid IDS
models that combined SBIDS with anomaly detection to increase accuracy. Ribeiro et al. (2016)
introduced LIME (Local Interpretable Model-agnostic Explanations), a framework that enhanced
the transparency of AI-driven security models, influencing the evolution of explainable IDS.[4]
5|Page
The period from 2017 to 2019 saw a significant rise in explainability research for IDS.
Doshi-Velez & Kim (2017) reviewed XAI techniques, including decision trees and rule-based
learning, to improve model interpretability in security applications. Gunning (2017) set forth
guidelines for interpretable AI models, emphasizing their importance in cybersecurity. Gadepally
et al. (2019) studied the need for explainability in IDS to enhance trust and operational efficiency,
while Gilmer et al. (2018) showcased how explainable models improved decision-making in
critical cybersecurity environments.[5]
6|Page
generation in IDS. Preece et al. (2022) leveraged NLP to generate human-readable explanations
for security alerts, improving administrator response times.
While traditional SBIDS provides an effective mechanism for detecting known threats, it has
long been criticized for operating as a “black box.” Administrators are often presented with alerts
but lack clear information on why specific traffic was flagged as suspicious. The literature points
to several key areas where explainability can enhance SBIDS:
The integration of explainability in SBIDS is still an emerging area, but there are promising studies
that combine SBIDS with machine learning and XAI techniques:
• Hybrid IDS with Explainability: Research by Shashidhar et al. (2020) explores the
integration of machine learning with traditional SBIDS to enhance detection capabilities
and provide more granular explanations. These hybrid systems are designed to detect novel
attacks using anomaly detection and then generate explanations by comparing the
anomalous traffic to known attack signatures.[5]
• Automated Generation of Explanations: Automated systems that generate explanations
for IDS decisions are also gaining traction in the literature. Contributions such as those
from Gunning et al. (2019) suggest that explainability can be enhanced through rule
extraction techniques and interpretability layers that automatically generate
justifications for each alert based on the underlying rules and signatures.[5]
• Shashidhar et al. (2020) explored hybrid models integrating machine learning with SBIDS
to enhance detection and explainability. Research on automated explanation systems has
gained traction, with Amershi et al. (2023) focusing on AI-assisted decision-making in
cybersecurity to enhance human-AI collaboration. Hybrid SBIDS models combining
7|Page
signature-based and anomaly-based techniques are being developed to provide improved
detection accuracy while maintaining explainability.[5]
The literature highlights the growing need for Explainable Signature-Based Intrusion Detection
Systems (SBIDS) to improve transparency, trust, and effectiveness in detecting cyber threats.
While SBIDS has long been a mainstay in network security, integrating explainability addresses
its shortcomings by providing security teams with actionable, interpretable insights into the
reasoning behind detections. Explainable IDS systems hold great promise in reducing false
positives, enhancing decision-making, and fostering trust in automated security tools. The
research direction is now moving towards combining traditional SBIDS with XAI frameworks to
ensure that these systems are not only effective but also transparent and auditable, aligning with
modern cybersecurity demands.
8|Page
CHAPTER 3
SYSTEM ANALYSIS
System analysis is a critical step in understanding the design, architecture, functionality,
and limitations of any system. In this section, we analyze both the existing system and the
proposed system, outline the necessary system requirements, and conduct a thorough system
study to ensure a clear understanding of how the system will operate and what improvements are
necessary.
The existing system in this context refers to the traditional Signature-Based Intrusion
Detection Systems (SBIDS), which are widely used in the cybersecurity industry for detecting
and mitigating network threats. The existing SBIDS work by comparing incoming network traffic
against a database of predefined attack signatures. If a match is found, the system flags the activity
as potentially malicious and generates an alert.
While the existing SBIDS has proven effective in detecting known threats, it suffers from
several limitations:
• Lack of Explainability: The biggest limitation of the current SBIDS is its inability to
explain why a particular alert was triggered. Analysts only see the result (i.e., the alert)
without any justification or reasoning behind it. This leads to a lack of trust in the system
and difficulties in verifying the validity of the alerts.
• Unable to Detect Zero-Day Attacks: The SBIDS relies on a predefined set of signatures
that represent known attacks. Any new or unknown attack (such as zero-day exploits) will
not be detected since no signature exists for it. This makes the system less effective against
emerging threats.
• High False Positives: The existing system often generates a high number of false
positives—alerts that indicate an attack when there is none. These false alarms overwhelm
security teams and make it difficult for them to prioritize real threats.
• Manual Analysis: The lack of explainability means that security analysts must manually
inspect logs and data to verify each alert, which increases their workload and slows down
response times.
• Static Signature Databases: Since SBIDS depends on static databases of attack
signatures, it must be regularly updated with the latest signatures. This reactive approach
leaves a window of vulnerability until updates are applied.
9|Page
3.1.2 Strengths of the Existing System
The Proposed System aims to improve upon the existing SBIDS by introducing the concept
of Explainability to enhance transparency, trust, and effectiveness. The Explainable Signature-
Based Intrusion Detection System (ESBIDS) integrates explainability features into the
traditional signature-based approach, providing detailed and interpretable information about each
detection.
10 | P a g e
• Automated Explanations Using AI: The system may utilize Explainable AI (XAI)
techniques to automatically generate and present explanations for detected events. Methods
like decision trees, rule-based learning, and feature importance mapping (e.g., SHAP
values) can help explain which characteristics of the traffic matched the malicious
signature.
• Processor: Multi-core processor with high performance (e.g., Intel Core i7 or AMD Ryzen
7) to handle real-time network traffic analysis.
• Memory: Minimum of 16GB RAM to ensure smooth operation, with more recommended
for environments with large traffic volumes.
• Storage: High-speed SSDs with at least 1TB of storage to store signatures, logs, and event
data.
• Network Interface: High-speed network interface cards (NICs) capable of handling large
volumes of traffic at gigabit or higher speeds.
11 | P a g e
3.3.2 Software Requirements
• Operating System: Linux-based OS (e.g., Ubuntu, CentOS) for stable and secure
deployment, although other OS environments (e.g., Windows) may also be supported.
• Database: Relational database management systems (e.g., MySQL, PostgreSQL) for
storing signature databases, event logs, and explanations.
• IDS Software: Existing SBIDS software (e.g., Snort, Suricata) which will be augmented
with explainability modules.
• Explainability Frameworks: XAI libraries such as LIME, SHAP, or Sklearn for
integrating machine learning models and generating explanations.
• Visualization Tools: Tools like Grafana or Kibana for visualizing alerts, logs, and
explanations on dashboards.
• Network Topology: The system should be placed at key points in the network architecture,
such as between the firewall and internal network, or monitoring multiple points for better
visibility.
• Regular Updates: Regular updating of the signature database and explainability models is
essential to ensure that the system remains capable of detecting new and emerging threats.
A system study helps to analyze the feasibility, operational requirements, and potential
impacts of the proposed system. This includes a thorough understanding of how the system
interacts with existing infrastructure and how it improves upon the current security posture.
• Technical Feasibility: The proposed system relies on existing SBIDS technology, which
is widely supported. The addition of explainability modules can be achieved using well-
established XAI frameworks, making the technical implementation feasible with existing
resources.
• Economic Feasibility: While the initial cost of integrating explainability features may
require investment in development and training, the long-term savings in terms of reduced
false positives, quicker response times, and enhanced security far outweigh the initial costs.
• Operational Feasibility: The system will be user-friendly, designed to integrate with
existing IDS infrastructure, and provide additional value through its explainability features
without significantly increasing the operational burden.
12 | P a g e
3.4.2 Security and Risk Study
13 | P a g e
CHAPTER 4
SYSTEM DESIGN
The System Design phase focuses on defining the architecture, components, modules, and
interfaces for the Explainable Signature-Based Intrusion Detect ion System (ESBIDS). This
section provides a detailed view of how the system is structured, how data flows through the
system, and how various components interact using diagrams and modular breakdowns.
14 | P a g e
oExplanation Dashboard: Displays detailed explanations and visualizations of
detection events, including matched signatures, associated traffic features, and
contextual reasoning for the alert.
o Log Management: Logs all traffic data, detection events, and explanations for
audit and review.
5. Anomaly Detection (Optional - Hybrid Approach):
o This layer involves an optional anomaly detection system that works in conjunction
with the signature-based system. It uses machine learning models to identify
unknown or zero-day attacks and provides corresponding explanations when
anomalies are detected.
15 | P a g e
4.2 Data Flow Diagrams (DFD)
Data Flow Diagrams (DFDs) provide a graphical representation of the flow of data within the
system. It helps to understand how data moves from input (network traffic) to output (explanations
and alerts) and how various processes handle the data.
The Level 0 DFD (Context Diagram) represents the system as a single process and outlines its
interaction with external entities.
• External Entities:
o Network Traffic Source: Inputs real-time network traffic data into the system.
o Security Analyst: Receives alerts, explanations, and visualizations from the
system.
o Signature Database Source: Supplies updated signatures to the system
periodically.
Data Flow:
• Network traffic flows into the system, which processes the data to generate alerts and
explanations. The results are then sent to the security analyst for further investigation.
The Level 1 DFD provides more details on the internal processes of the system, showing how
data is handled at each stage.
16 | P a g e
o The preprocessed traffic is compared against the known attack signatures stored in
the database. If a match is found, it triggers the next process.
• Process 3: Explanation Generation:
o The matched signature and network data are processed by the explanation
generator, which creates a human-readable explanation for the detection.
• Process 4: Alert Processing and Visualization:
o The alert processor formats the detection and explanation into a structured alert
message, which is visualized on the dashboard for the security analyst.
17 | P a g e
This diagram represents the workflow of an Explainable Signature-Based Intrusion
Detection System (IDS). It captures network traffic, preprocesses it, and matches it against known
attack signatures stored in a database. If a match is found, an explanation is generated, and an alert
is processed and visualized for security analysts. The system enhances threat detection by
providing understandable explanations for detected intrusions.
Unified Modeling Language (UML) diagrams help represent the system's design and its
interactions through class diagrams, sequence diagrams, and use case diagrams.
Use Cases:
• Monitor Network Traffic: The system monitors real-time traffic for malicious activity.
• Generate Alerts: The system triggers alerts when a signature match is found.
• Provide Explanations: The system generates human-readable explanations for the alerts.
• Update Signatures: The system updates the signature database as new signatures are
added.
• View Logs: The security analyst can view the logs and past events for further analysis.
A security analyst receives the alerts and explanations, allowing them to understand
the nature of the threat and take necessary countermeasures. The signature database is
continuously updated to ensure the detection of new threats, and the system administrator
is responsible for managing system configurations and updating security policies. The
administrator also ensures that the system remains effective against emerging threats by
fine-tuning its detection mechanisms.
18 | P a g e
more interpretable and actionable, reducing false positives and improving overall network
security.
19 | P a g e
4.3.2 Class Diagram
The Class Diagram outlines the key classes and relationships in the system.
20 | P a g e
This UML class diagram represents the components of an Explainable Signature-
Based Intrusion Detection System (IDS). The TrafficCollector captures and forwards
network traffic to the SignatureMatcher, which detects malicious patterns. The
ExplanationGenerator creates explanations for detected threats, and the AlertManager
sends alerts to the Dashboard for visualization. Additionally, the Logger stores and
retrieves logs for system monitoring.
The Sequence Diagram depicts the flow of messages between objects over time,
demonstrating the interaction during an event detection scenario.
21 | P a g e
4.4 Implementation Modules
The system is divided into several implementation modules to handle different aspects of the
detection and explanation process. These modules interact with each other to form a cohesive and
efficient system.
• Functionality: Captures network traffic in real time, filters out irrelevant data, and sends
the processed packets to the Signature Matching Module.
• Tools/Technologies: Tools like tcpdump or libpcap can be used for packet capture. The
module should be capable of handling high volumes of data with low latency.
• Functionality: Compares incoming network traffic with the stored attack signatures. If a
match is found, it forwards the event to the Explanation Generator module.
• Tools/Technologies: Open-source tools like Snort or Suricata can be used to implement
signature matching functionality.
• Functionality: Processes detection events and explanations, formats them into alerts, and
presents them to the security analyst through the dashboard.
• Tools/Technologies: Python or Java can be used to build this module, with integration into
log management systems like Elastic Stack (ELK) for visualization.
22 | P a g e
4.4.6 Log Management Module
• Functionality: Stores event logs and explanations for audit, review, and compliance
purposes. Security teams can query these logs to investigate past incidents.
• Tools/Technologies: Databases such as Elasticsearch or PostgreSQL can be used to store
and manage logs efficiently.
23 | P a g e
CHAPTER 5
SOFTWARE ENVIRONMENT
The Software Environment outlines the tools, frameworks, and platforms required for the
development, deployment, and operation of the Explainable Signature-Based Intrusion
Detection System (ESBIDS). The choice of software environment is critical for ensuring that the
system runs efficiently, is maintainable, and can integrate with other network components.
• Python: Used for implementing explainability algorithms, traffic analysis, and signature
matching modules. Python’s extensive library support for machine learning (e.g., LIME,
SHAP, Scikit-learn) makes it ideal for implementing explainable models.
• C++/C: Can be used in the packet capturing and traffic processing modules for low-level
network data manipulation and optimization.
24 | P a g e
5.4 Explainability Tools
25 | P a g e
mechanisms, and access control features ensure data consistency, reliability, and security.
Integrating these databases with an Explainable Signature-Based Intrusion Detection
System (ESBIDS) improves the auditability and transparency of IDS operations, allowing
analysts to review past alerts, understand system decisions, and refine detection strategies
for evolving cyber threats.
5.6 Visualization and Reporting
• Grafana/Kibana: Tools used for creating dashboards to display alerts, signatures, and
explanations. These tools help provide actionable insights through visual representations
of data.
• pytest/UnitTest (Python): For writing and running test cases to verify the accuracy and
robustness of each module in the system.
26 | P a g e
CHAPTER 6
EVALUATION AND TESTING
The Evaluation and Testing phase ensures that the system performs according to its
design specifications, detects intrusions accurately, provides meaningful explanations, and
integrates smoothly with existing network infrastructure.
27 | P a g e
• Scalability: Assess the system’s ability to scale in environments with large volumes of
traffic. The scalability of an Intrusion Detection System (IDS) determines its ability to
handle increasing network traffic without performance degradation. As data volume grows,
the system must efficiently distribute processing tasks across multiple nodes or use high-
performance computing techniques to maintain real-time analysis. Implementing load
balancing, parallel processing, and cloud-based solutions enhances scalability. A well-
designed IDS ensures consistent detection accuracy and minimal latency, even in large-
scale or high-speed network environments.
• False Positive/False Negative Rates: Evaluate the system’s rate of false positives and
false negatives to minimize irrelevant alerts. The false positive and false negative rates are
critical metrics in evaluating an Intrusion Detection System (IDS). A high false positive
rate leads to excessive, irrelevant alerts, overwhelming security teams and causing alert
fatigue, while a high false negative rate means real threats go undetected, increasing
security risks. Balancing these rates requires fine-tuning detection rules, optimizing
threshold settings, and incorporating explainability methods to improve decision-making.
A well-calibrated IDS ensures accurate threat detection while minimizing unnecessary
alerts, enhancing both security efficiency and operational effectiveness.
6.2 Testing Methods
• Unit Testing: Each module (traffic capture, signature matching, explanation generator)
will undergo unit testing to validate functionality and accuracy. Unit testing ensures that
each module of the Intrusion Detection System (IDS), including traffic capture, signature
matching, and explanation generation, functions correctly and accurately. Each component
is tested independently to detect errors early and improve system reliability. Automated
tests validate performance, accuracy, and robustness under various conditions. This
approach enhances system stability and detection efficiency before full deployment.
• Integration Testing: Modules will be integrated and tested in a simulated environment to
ensure smooth communication and data flow. Integration testing verifies that all IDS
modules—traffic capture, signature matching, and explanation generation—work together
seamlessly. Testing in a simulated environment ensures proper data flow, communication,
and interoperability between components. It helps identify and resolve issues related to
latency, data loss, or misconfigurations. A successful integration test ensures the IDS
functions efficiently and accurately in real-world scenarios.
• Load Testing: Stress test the system with high volumes of network traffic to assess
performance and stability under load. Load testing evaluates the IDS’s ability to handle
high volumes of network traffic while maintaining performance and stability. By
simulating real-world traffic spikes, it helps identify bottlenecks, latency issues, and
potential failures. Optimizing resource allocation, parallel processing, and scalability
ensures efficient threat detection under heavy loads. A well-executed load test guarantees
reliable and real-time intrusion detection in high-traffic environments.
28 | P a g e
• User Acceptance Testing (UAT): Security analysts will interact with the system to ensure
that explanations are clear and alerts are useful for decision-making. User Acceptance
Testing (UAT) involves security analysts evaluating the IDS to ensure that alerts are
relevant and explanations are clear and actionable. Analysts interact with the system to
assess its usability, interpretability, and effectiveness in real-world threat detection.
Feedback from UAT helps refine detection accuracy, reduce false positives, and improve
the user interface. A successful UAT ensures the IDS meets operational needs and
enhances security decision-making.
6.3 Evaluation Metrics
• Detection Rate: Percentage of correctly detected malicious activities. The detection rate
measures the percentage of correctly identified malicious activities by the Intrusion
Detection System (IDS). A high detection rate indicates effective threat identification,
minimizing the risk of undetected attacks. This metric is influenced by the quality of
signatures, model training, and feature selection. Continuous updates and optimizations
help maintain a robust and reliable IDS with accurate threat detection.
• False Positive Rate: Proportion of benign events incorrectly flagged as malicious. The
false positive rate represents the proportion of benign events mistakenly classified as
malicious by the Intrusion Detection System (IDS). A high false positive rate can
overwhelm security teams with unnecessary alerts, leading to alert fatigue and reduced
efficiency. Optimizing signature rules, threshold settings, and explainability techniques
helps minimize false positives. A well-calibrated IDS ensures accurate threat detection
while reducing irrelevant alerts.
• Explanation Understandability:Feedback from security analysts on the clarity of
explanations. Explanation understandability measures how well security analysts can
interpret the IDS-generated explanations for detected threats. Clear, concise, and actionable
insights improve analysts' ability to respond effectively to security incidents. Feedback
helps refine the explanation model, ensuring that alerts are transparent and informative. A
well-explained detection process enhances trust, usability, and decision-making in
cybersecurity operations.
• System Latency: Time taken from traffic capture to alert generation. System latency refers
to the time taken from traffic capture to alert generation in the Intrusion Detection System
(IDS). Low latency ensures real-time threat detection, allowing for faster incident response.
Factors affecting latency include processing speed, algorithm efficiency, and system
workload. Optimizing these aspects ensures quick and accurate intrusion detection without
delaying security actions.
29 | P a g e
• Model Performance Analysis
After training the One-Class Support Vector Machine (OCSVM) on normal traffic data, the
model was tested with both normal and anomalous samples. The results demonstrated the system's
effectiveness in detecting intrusions. The OCSVM model achieved the following performance
metrics:
• Accuracy: 94.8%
• Precision:91.5%
• Recall: 95.2%
• F1-score: 93.3%
The high accuracy indicates that the model effectively differentiates between normal and
malicious network traffic. The precision of 91.5% shows that most of the flagged anomalies are
indeed intrusions, while the recall of 95.2% highlights the system's ability to detect a significant
portion of all actual intrusions. The F1-score of 93.3% confirms the model’s balanced
performance, combining both precision and recall. The false positive rate of 4.7% is low,
indicating that the model does not frequently misclassify normal traffic as anomalous, making it
suitable for real-world deployment.
30 | P a g e
CHAPTER 7
import numpy as np
import pandas as pd
from sklearn import utils
import matplotlib
read_data =
pd.read_csv(r"C:\Users\dhara\OneDrive\Desktop\datasets\kdd_train.csv",low_memory=False)
#accuracy,algo,confusionmatrix,chi
read_data = read_data[read_data["logged_in"] == 1]
#read_data = read_data[read_data['service'] == "http"]
read_data
#read_data["duration"] = np.log((read_data["duration"] + 0.1).astype(float))
#read_data["src_bytes"] = np.log((read_data["src_bytes"] + 0.1).astype(float))
#read_data["dst_bytes"] = np.log((read_data["dst_bytes"] + 0.1).astype(float))
read_data.loc[read_data['labels'] == "normal", "traffic_behaviour"] = 1
read_data.loc[read_data['labels'] != "normal", "traffic_behaviour"] = 0
read_data
read_data.drop(read_data[(read_data["dst_bytes"]<0)].index, inplace=True)
read_data.drop(read_data[(read_data["src_bytes"]<0)].index, inplace=True)
31 | P a g e
#read_data.drop(read_data[(read_data["duration"]<0)].index, inplace=True)
y = read_data["traffic_behaviour"]
read_data
train_protocol_type = {'tcp': 0, 'udp': 1, 'icmp': 2}
train_protocol_type.items()
read_data.protocol_type = [train_protocol_type[item] for item in read_data.protocol_type]
train_service = {'aol': 1, 'auth': 2, 'bgp': 3, 'courier': 4, 'csnet_ns': 5, 'ctf': 6, 'daytime': 7, 'discard':
8, 'domain': 9, 'domain_u': 10, 'echo': 11, 'eco_i': 12, 'ecr_i': 13, 'efs': 14, 'exec': 15,
'finger': 16, 'ftp': 17, 'ftp_data': 18, 'gopher': 19, 'harvest': 20, 'hostnames': 21, 'http': 22,
'http_2784': 23, 'http_443': 24, 'http_8001': 25, 'imap4': 26, 'IRC': 27, 'iso_tsap': 28,
'klogin': 29, 'kshell': 30, 'ldap': 31, 'link': 32, 'login': 33, 'mtp': 34, 'name': 35,
'netbios_dgm': 36, 'netbios_ns': 37, 'netbios_ssn': 38, 'netstat': 39, 'nnsp': 40, 'nntp': 41,
'ntp_u': 42, 'other': 43, 'pm_dump': 44, 'pop_2': 45, 'pop_3': 46, 'printer': 47, 'private':
48, 'red_i': 49, 'remote_job': 50, 'rje': 51, 'shell': 52, 'smtp': 53, 'sql_net': 54, 'ssh': 55,
'sunrpc': 56, 'supdup': 57, 'systat': 58, 'telnet': 59, 'tftp_u': 60, 'tim_i': 61, 'time': 62,
'urh_i': 63, 'urp_i': 64, 'uucp': 65, 'uucp_path': 66, 'vmnet': 67, 'whois': 68, 'X11': 69,
'Z39_50': 70}
read_data.service = [train_service[item] for item in read_data.service]
# Changing the training flag coloumn
train_flag = {'SF': 0, 'S0': 1, 'REJ': 2, 'RSTR': 3, 'RSTO': 4, 'S1': 5, 'SH': 6, 'S2': 7, 'RSTOS0': 8,
'S3': 9, 'OTH': 10}
read_data.flag =[train_flag[item] for item in read_data.flag]
train_replace_map = {'normal':"normal",'DOS': ['back', 'land', 'pod', 'neptune', 'smurf', 'teardrop'],
'R2L': ['ftp_write', 'guess_passwd', 'imap', 'multihop', 'spy', 'phf', 'warezclient',
'warezmaster'], 'U2R': ['buffer_overflow', 'loadmodule', 'perl', 'rootkit'],
'PROBE': ['ipsweep', 'nmap', 'portsweep', 'satan']}
read1_data = read_data.assign(
labels=read_data['labels'].apply(
32 | P a g e
lambda x: [key for key, value in train_replace_map.items() if x in value]))
read1_data["labels"]
train_label= {"['normal']": 0, "['DOS']": 1, "['R2L']": 2, "['U2R']": 3, "['PROBE']": 4}
read1_data["labels"]=read1_data["labels"].astype(str)
read1_data.labels = [train_label[item] for item in read1_data.labels]
#read1_data['duration'] = np.where((read1_data.duration <= 2), 0, 1)
#read1_data['src_bytes'] = np.where((read1_data.src_bytes <= 2), 0, 1)
#read1_data['dst_bytes'] = np.where((read1_data.dst_bytes <= 2), 0, 1)
x = read1_data
x
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(x,y,test_size=0.8, random_state=100)
Chi2 TEST
from sklearn import datasets
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
chi2_selector = SelectKBest(chi2,k=43)
X_kbest = chi2_selector.fit_transform(x, y)
p_values=pd.Series(X_kbest[0])
p_values.index=x.columns
p_values.sort_values(ascending=False)
33 | P a g e
read_data
=pd.read_csv(r"C:\Users\dhara\OneDrive\Desktop\datasets\kdd_train.csv",low_memory=False)
read_data["labels"]
read_data = read_data[read_data['service'] == "http"]
read_data = read_data[read_data["logged_in"] == 1]
applicable_features = [
"duration",
"src_bytes",
"dst_bytes",
"labels",
"dst_host_srv_count",
"dst_host_count"]
read_data = read_data[applicable_features]
read_data
read_data["duration"] = np.log((read_data["duration"] + 0.1).astype(float))
read_data["src_bytes"] = np.log((read_data["src_bytes"] + 0.1).astype(float))
read_data["dst_bytes"] = np.log((read_data["dst_bytes"] + 0.1).astype(float))
read_data["dst_host_srv_count"] = np.log((read_data["dst_host_srv_count"] + 0.1).astype(float))
read_data["dst_host_count"] = np.log((read_data["dst_host_count"] + 0.1).astype(float))
read_data.head
read_data.loc[read_data['labels'] == "normal", "traffic_behaviour"] = 1
read_data.loc[read_data['labels'] != "normal", "traffic_behaviour"] = -1
read_data
target = read_data['traffic_behaviour']
outliers = target[target == -1]
print("outliers.shape", outliers.shape)
print("outlier fraction", outliers.shape[0]/target.shape[0])
read_data.drop(["labels","traffic_behaviour"], axis=1, inplace=True)
read_data.shape
34 | P a g e
from sklearn.model_selection import train_test_split
train_data, test_data, train_target, test_target = train_test_split(read_data, target, train_size =
0.8)
train_data.shape
train_data.tail
from sklearn import svm
nu = outliers.shape[0] / target.shape[0]
print("The calculated values of nu is:", nu)
35 | P a g e
cm_display.plot()
plt.show()
test_target.to_csv("test_target.csv")
36 | P a g e
# Generate explanation
exp = explainer.explain_instance(
data_row=sample.values[0],
predict_fn=predict_proba,
num_features=5 # Number of features to show in explanation
)
# Show the explanation
print(f"Explanation for test sample {sample_idx}:")
exp.as_pyplot_figure()
plt.show()
# Print the explanation as a list
print("\nFeature importance for this prediction:")
for feature, value in exp.as_list():
print(f"{feature}: {value}")
# Save the explanation to HTML if desired
exp.save_to_file('lime_explanation.html')
exp.show_in_notebook(show_table=True, show_all=False)
# Get the actual prediction for comparison
actual_pred = model.predict(sample)[0]
actual_label = test_target.iloc[sample_idx]
print(f"\nActual prediction: {'Normal' if actual_pred == 1 else 'Anomaly'}")
print(f"Actual label: {'Normal' if actual_label == 1 else 'Anomaly'}")
37 | P a g e
7.2 INPUT & OUTPUT :
Load the KDD CUP ’99 Dataset in VScode.
The code loads a dataset with pandas.read_csv() or an equivalent function.
The dataset has feature columns and target labels.
The notebook works on this data for feature selection (Feature Selection notebook) or anomaly
detection (Anomaly OCSVM notebook).
OUTPUT:
FEATURE SELECTION
38 | P a g e
Fig 7.3 Traffic Behavior Analysis of Logged-in Users
40 | P a g e
Fig 7.6 P-Values of Selected Features for Chi-Square Test
The code filters the dataset to include only logged-in users. It extracts the
"traffic_behaviour" column to analyze user activity. Labels are mapped using a predefined
dictionary to categorize them. Certain numerical features are converted into binary values based
on thresholds. The chi-square test selects the best 43 features, and their p-values are sorted.
41 | P a g e
ANAMOLY OCSVM ALGORITHM
42 | P a g e
Fig 7.9 Outlier Detection in Traffic Behavior Data
43 | P a g e
Fig 7.12 Performance Metrics of the Test Dataset
From the above fig the metrics column represents the performance metrics of the One-
Class SVM model on the test dataset, measuring the model's ability to generalize to unseen data.
The accuracy of 96.98% means that the model correctly classifies the majority of test samples. The
precision of 98.59% means that when the model predicts that a sample is normal, it is correct
98.59% of the time, which means that the model has a low false positive rate. The 98.31% recall
indicates that the model is successful in identifying 98.31% of all true normal cases, i.e., it is low
on the false negative front. The 98.45% F1-score, a balance between recall and precision, further
ensures that the model remains high on test set performance. These findings imply that the model
is extremely strong and reliable at identifying anomalies and has good generalization from training
to testing.
44 | P a g e
misclassified inaccurately anomalies as normal; and False Negatives (119), where it did not detect
anomalies and classified them as normal. With a high number of True Positives and True
Negatives, it indicates that the model, in general, does a good job. But the fact that there are 119
False Negatives shows there are some false negatives, and these might be important in security
applications. Fiddling with parameters or moving the anomaly threshold (nu) might assist the
model to better detect anomalies.
45 | P a g e
Fig 7.15 Feature Importance for Prediction
46 | P a g e
7.3 Results
• Improved Detection: The system was able to detect all known attacks from the signature
database with high accuracy.
• Enhanced Explainability: The use of LIME/SHAP in the explanation generator module
produced clear, interpretable explanations that helped security analysts better understand
why alerts were triggered.
• Reduced False Positives: The system demonstrated a significantly reduced false positive
rate compared to traditional SBIDS systems.
• Efficient Performance: The system handled high network traffic volumes with minimal
performance degradation, making it suitable for real-time use.
The Explainable Signature-Based Intrusion Detection System (X-SBIDS) was evaluated using
the KDD CUP '99 dataset, which contains both normal and malicious network traffic samples. The
performance of the system was analyzed in terms of its accuracy, precision, recall, F1-score, and
false positive rate. Additionally, the explainability of the system was assessed using SHAP
visualizations, providing insights into the influence of each feature on the model’s decisions.
To interpret the model’s predictions, SHAP (SHapley Additive exPlanations) was applied to
visualize the contribution of individual features. The SHAP summary plot revealed that the most
influential features in detecting anomalies were:
• src_bytes (bytes sent from source to destination): The most important feature, with a
positive correlation to anomalies. Large or sudden data transfers were often flagged as
intrusions.
• dst_bytes (bytes sent from destination to source): A major indicator of anomalous behavior,
particularly when a large volume of data was sent back, indicating possible data
exfiltration.
• count (number of connections to the same host): Anomalous samples showed a higher
connection count, signaling potential brute-force or DDoS attacks.
• srv_count (number of connections to the same service): Higher srv_count values were
linked to repeated access attempts, suggesting possible credential stuffing or scanning
activities.
• same_srv_rate (percentage of connections to the same service): Frequently observed in
legitimate traffic but showed abnormal patterns in some intrusion cases, making it a key
indicator of suspicious behavior.
47 | P a g e
The SHAP force plot provided instance-specific explanations, visualizing how each feature
contributed to individual predictions. For example, in a flagged anomaly, src_bytes and dst_bytes
had significantly higher values, pushing the model’s prediction toward the anomalous class. This
interpretability ensures that security analysts can understand why a particular sample was
classified as an anomaly, making the model more transparent and trustworthy.
1. Confusion Matrix:
The confusion matrix revealed that the model correctly classified 94.8% of the samples.
The small number of false positives indicates the model's reliability, while the low false
negatives demonstrate its effectiveness in capturing true intrusions.
2. ROC Curve:
The Receiver Operating Characteristic (ROC) curve showed a large area under the curve
(AUC = 0.96), highlighting the model's high discriminatory power in distinguishing
between normal and anomalous traffic.
The X-SBIDS system was compared against traditional machine learning models, such as
Random Forest, Decision Tree, and k-Nearest Neighbors (k-NN), using the same dataset. The
results showed that the OCSVM with SHAP-based interpretability outperformed the traditional
models in terms of both accuracy and interpretability:
The superior performance of the X-SBIDS demonstrates that combining anomaly detection
with explainability leads to both higher accuracy and improved transparency, making the system
more effective and reliable for real-world intrusion detection.
48 | P a g e
7.4 DISCUSSION
The integration of XAI (Explainable AI) with SHAP visualizations provides valuable
insights into the model's decision-making process. This interpretability enhances the system's
trustworthiness, making it suitable for deployment in critical infrastructure environments where
transparency is essential. Security analysts can understand why certain samples are flagged as
anomalies, enabling faster and more accurate threat responses.
Furthermore, the low false positive rate ensures that the system does not generate excessive
alerts, preventing alert fatigue and enhancing its usability in real-world security operations. The
system’s modular architecture also makes it scalable and adaptable to different datasets or network
environments, increasing its practicality and flexibility.
49 | P a g e
CHAPTER 8
FUTURE DEVELOPMENT AND CONCLUSION
Conclusion
A major advantage of ESBIDS is its ability to significantly reduce false positives, which
are a common challenge in traditional IDS. By integrating explainability techniques such as LIME
(Local Interpretable Model-Agnostic Explanations) and SHAP (SHapley Additive exPlanations),
the system provides insights into why an alert was triggered, helping analysts differentiate between
real threats and benign activities. This leads to more efficient security operations, reduced alert
fatigue, and improved resource allocation, as analysts can focus on genuine security incidents
rather than sifting through false alarms.
Additionally, ESBIDS enhances the overall security posture of the network by ensuring
that security events are not only detected but also properly understood. The ability to log and
analyze historical alerts using tools like Elastic Stack (ELK) and relational databases
(PostgreSQL/MySQL) ensures long-term auditability and forensic analysis, enabling
organizations to refine their detection strategies over time. Moreover, by providing explanations
alongside alerts, ESBIDS fosters greater trust and collaboration between automated security
systems and human analysts, making cybersecurity decision-making more informed and proactive.
50 | P a g e
further enhance its ability to detect zero-day threats and sophisticated cyber-attacks. By
continuously refining its explanation mechanisms, scalability, and real-time performance, ESBIDS
has the potential to become a next-generation IDS solution, combining high detection accuracy,
minimal false positives, and strong interpretability to safeguard modern networks from cyber
threats.
Future Development
• Anomaly Detection Integration: While the current system primarily focuses on signature-
based detection, future versions could incorporate machine learning-based anomaly
detection to identify unknown threats and zero-day attacks.
• Improved Visualization: Enhancing the dashboard with more advanced visualizations,
such as real-time attack maps and trend analysis, could provide security analysts with
deeper insights.
• Continuous Learning: Future systems could include continuous learning mechanisms to
update signatures automatically based on new attack patterns or analyst feedback.
• Integration with Threat Intelligence: Connecting the system with external threat
intelligence platforms can help improve detection by using the latest threat data.
51 | P a g e
CHAPTER 9
APPENDIX
• IDS (Intrusion Detection System): A software or hardware tool that monitors a network
or systems for malicious activities or policy violations. Alerts are typically generated when
such activities are detected.
• IPS (Intrusion Prevention System): A system that actively prevents detected threats by
blocking or mitigating the potential damage caused by the malicious activity.
• Signature-Based Detection: A method of identifying intrusions by comparing network or
system activity against a database of known attack signatures or patterns.
• Signature Database: A repository of predefined signatures or patterns representing known
types of attacks or vulnerabilities that an IDS uses to detect malicious activity.
• Explainable AI (XAI): A field of artificial intelligence that focuses on making AI models
interpretable and understandable by humans. In this context, XAI helps explain why certain
network traffic was flagged as suspicious.
• LIME (Local Interpretable Model-agnostic Explanations): A tool used to explain the
predictions of machine learning models by approximating them with simpler, interpretable
models at a local level.
• SHAP (SHapley Additive exPlanations): A method that provides consistent, feature-
based explanations for machine learning model predictions. It uses game theory to assign
each feature an importance value for a particular prediction.
• Packet Preprocessing: The process of filtering and cleaning raw network traffic data
(packets) to extract relevant information for further analysis.
• Alert: A notification generated by an IDS or IPS when it detects potential malicious
activity based on the matching of network traffic to known attack signatures.
• False Positive: A situation where an IDS incorrectly identifies benign activity as malicious,
leading to unnecessary alerts.
• False Negative: A situation where an IDS fails to detect malicious activity, allowing the
attack to go unnoticed.
• Traffic Monitoring: The process of capturing and analyzing network traffic in real-time
to detect any suspicious activities.
• Dashboard: A visual interface that displays real-time information and alerts to security
analysts. It is often used for monitoring, analysis, and management of network security.
• Log Management: The process of collecting, storing, and analyzing log data generated by
a system or network. Logs are crucial for auditing and tracking network activities.
• Elasticsearch: A distributed search and analytics engine used for log and event data
analysis. It forms part of the ELK (Elasticsearch, Logstash, Kibana) stack.
52 | P a g e
• Kibana: A data visualization and exploration tool used in conjunction with Elasticsearch.
It provides visualizations such as charts and graphs for data stored in Elasticsearch.
• Grafana: An open-source platform for monitoring and observability. It is used to create
and share dashboards and visualizations for performance metrics.
• Unit Testing: A software testing technique where individual components or modules of a
system are tested to verify that they function correctly.
• Integration Testing: A phase of software testing in which different modules or
components are tested together to ensure they work correctly as an integrated system.
• User Acceptance Testing (UAT): A testing process in which actual users test the system
to verify that it meets the required business needs and performs as expected.
53 | P a g e
9.2 References
54 | P a g e
11. Network Security Monitoring with Suricata and Elastic Stack
Cichonski, P., Millar, T., & Scarfone, K. (2015). Guide to Intrusion Detection and
Prevention Systems (IDPS). NIST Special Publication 800-94.
DOI: https://doi.org/10.6028/NIST.SP.800-94000
12. Mahbooba, B., Timilsina, M., Sahal, R., & Serrano, M. (2021). Explainable artificial
intelligence (XAI) to enhance trust management in intrusion detection systems using
decision tree model. Complexity, 2021(1), 6634811.
https://doi.org/10.1155/2021/6634811
13. Einy, S., Oz, C., & Navaei, Y. D. (2021). The anomaly- and signature-based IDS for
network security using hybrid inference systems. Mathematical Problems in Engineering,
2021(1), 6639714. https://doi.org/10.1155/2021/6639714
14. Joyo, W. A., Samual, J., Elango, S., Ismail, M., Johari, Z., & Stephen, D. (2020). IDS:
Signature-Based Peer-to-Peer Intrusion Detection System for Novice Users. ICCNCT
2019. Springer Nature Switzerland AG, 114–126. https://doi.org/10.1007/978-3-030-
41098-0_10
15. Nawaal, B., Haider, U., Khan, I. U., & Fayaz, M. (2023). Signature-Based Intrusion
Detection System for IoT. CRC Press - IoT Security. 135–148.
https://doi.org/10.1201/9781003183472
16. Neupane, S., Ables, J., Anderson, W., Mittal, S., Rahimi, S., Banicescu, I., & Seale, M.
(2022). Explainable Intrusion Detection Systems (X-IDS): A Survey of Current Methods,
Challenges, and Opportunities. IEEE Access, 10, 112391–112413.
https://doi.org/10.1109/ACCESS.2022.3216617
17. Denning, D. E. (1987). An Intrusion-Detection Model. IEEE Transactions on Software
Engineering, 13(2), 222 232. https://doi.org/10.1109/TSE.1987.232894
18. Sommer, R., & Paxson, V. (2010). Outside the Closed World: On Using Machine Learning
for Network Intrusion Detection. IEEE Symposium on Security and Privacy, 305-316.
https://doi.org/10.1109/SP.2010.25
19. Shone, N., Ngoc, T. N., Phai, V. D., & Shi, Q. (2018). A Deep Learning Approach to
Network Intrusion Detection. IEEE Transactions on Emerging Topics in Computational
Intelligence, 2(1), https://doi.org/10.1109/TETCI.2017.2772792
20. Khan, F. A., Gani, A., Wahab, A. W. A., Rodrigues, J. J. P. C., & Ko, K. (2021).
Explainable Machine Learning Based Cybersecurity Threat Detection. Future Generation
Computer Systems, 115(1), 56-69. https://doi.org/10.1016/j.future.2020.07.018
21. Laskov, P., Düssel, P., Schäfer, C., & Rieck, K. (2005). Learning Intrusion Detection:
Supervised or Unsupervised? International Conference on Image Analysis and Processing,
https://doi.org/10.1007/11553595_7
55 | P a g e