Anomaly-Based Intrusion Detection System
Anomaly-Based Intrusion Detection System
4,200
Open access books available
116,000
International authors and editors
125M
Downloads
154
Countries delivered to
TOP 1%
most cited scientists
12.2%
Contributors from top 500 universities
Anomaly-Based Intrusion
Detection System
Veeramreddy Jyothsna and Koneti Munivara Prasad
Abstract
Keywords: intrusion detection, data mining, classification based, DoS, Probe, U2R,
R2L, false alarm rate, zero-day attacks
1. Introduction
Today, the world has numerous inventions and technological developments with
proliferation of the Internet. Advances in business forced the organizations and
governments worldwide to invent and use sophisticated and modern networks.
These networks mix a variety of security aspects such as encryption, data integrity,
authentication, and technologies like distributed storage systems, voice over Inter-
net protocol (VoIP), wireless access, and web services.
Enterprises are more available to these systems. For instance, numerous business
associations enable access to their administration on the system through intranet
and web to their partners; endeavors empower clients to connect with the systems
by means of web-based business exchanges that enable representatives to get to
1
Computer and Network Security
data by methods for virtual private systems. This usage makes it more vulnerable to
attacks and intrusions. A security threat comes not only from the external intruders
but also from internal user in the form of abuse and misuse. A firewall simply blocks
the network but cannot protect against intrusion attempts. In contrast, intrusion
detection system (IDS) can monitor the abnormal activities on the network.
Intrusion detection systems play a vital role in research and development with
an increase in attacks on computers and networks [1]. Intrusion detection systems
monitor the events occurring in a computer system or networks for analyzing the
patterns of intrusions. IDS examine a host or network to spot the potential intru-
sions. Host-based systems explore the system calls and process identifiers mainly
related to the operating system data. On the other hand, network-based systems
analyze network-related events like traffic volume, IP address, service ports, and
protocol used. Intrusion detection systems will
ii. assess the integrity of critical system and data files; and
2
Anomaly-Based Intrusion Detection System
DOI: http://dx.doi.org/10.5772/intechopen.82287
volume of signatures, the performance of the engine also might lose the momen-
tum. Because of this, intrusion detection frameworks are conducted on multipro-
cessors and Gigabit cards. IDS developers develop new signatures before the
attackers develop solutions, in order to prevent any new kind of attacks on the
system.
Network behavior is the major parameter on which the anomaly detection sys-
tems rely upon. If the network behavior is within the predefined behavior, then the
network transaction is accepted or else it triggers the alert in the anomaly detection
system [3]. Acceptable network performance can be either predetermined or
learned through specifications or conditions defined by the network administrator.
The crucial stage of behavior determination is regarding the ability of detection
system engine toward multiple protocols at each level. The IDS engine must be able
to understand the process of protocols and its goal. Despite the fact that the protocol
analysis is very expensive in terms of computation, the benefits like increasing rule
set assist in lesser levels of false-positive alarms.
Defining the rule sets is one of the key drawbacks of anomaly-based detection.
The efficiency of the system depends on the effective implementation and testing of
rule sets on all the protocols. In addition, a variety of protocols that are used by
different vendors impact the rule defining the process.
In addition to the aforesaid, custom protocols also add complexity to the
process of rule defining. For accurate detection, the administration should clearly
understand the acceptable network behavior. However, with strong incorporation
of rules and protocol, the anomaly detection procedure would likely to perform
more efficiently.
However, if the malicious behavior falls under the accepted behavior, in such
conditions it might get unnoticed. The major benefit of the anomaly-based detec-
tion system is about the scope for detection of novel attacks. This type of intrusion
detection approach could also be feasible, even if the lack of signature patterns
matches and also works in the condition that is beyond regular patterns of traffic.
3
Computer and Network Security
Figure 1.
Common intrusion detection framework architecture.
Figure 2.
Common anomaly-based network IDS.
4
Anomaly-Based Intrusion Detection System
DOI: http://dx.doi.org/10.5772/intechopen.82287
Figure 3.
Classification of anomaly-based intrusion detection techniques.
computing based, data mining based, user intention identification, and computer
immunology.
i. They do not require any prior knowledge about the signatures of the attacks.
So, they can detect zero-day attacks.
ii. As the system is not depended on any of the signatures, updating is not
required. Hence it is easy to maintain.
iii. The intrusion activities that were occurred over extended period of time can
be identified accurately and are good at detecting DoS attacks.
5
Computer and Network Security
Knowledge-based techniques are used to extract the knowledge from the spe-
cific attacks and system vulnerabilities. This knowledge can be further used to
identify the intrusions or attacks happening in the network or system. They gener-
ate alarm as soon as an attack is detected. They can be used for both misuse and
anomaly-based detection [5].
The knowledge-based techniques are broadly classified as state transition analy-
sis, expert systems, and signature analysis.
The knowledge-based techniques possess good accuracy and very low false
alarm rates. The knowledge gathered makes security analyst easier to take preven-
tive or corrective action.
The knowledge-based techniques are maintaining the knowledge of each attack
based on the careful and detailed analysis performed; it is a time-consuming task. A
prior knowledge to update the each attack is a difficult task.
The knowledge-based IDS can detect the attacks whose patterns are known, but
it is difficult to detect the inside attacks. One of the solutions is data mining
techniques. The core idea is to extract the useful patterns and also the previously
ignored patterns from the dataset [6].
The data mining-based techniques are further classified into clustering, associa-
tion rule discovery, classification, K-nearest neighbor, and decision tree methods.
The key advantages of data mining-based techniques are as follows:
ii. As the precomputed models are designed in the training phase, comparing
each instance at the testing phase can be done in faster way.
ii. They require high storage and are slow in classifying due to high
dimensionality.
6
Anomaly-Based Intrusion Detection System
DOI: http://dx.doi.org/10.5772/intechopen.82287
Intrusion detection system can be built based on the features that categorize the
user or the system usage, to distinguish the abnormal activities from normal activ-
ities. During the early investigation of anomaly detection, the main emphasis was
on profiling system or user behavior from monitored system log or accounting log
data. The log data or system log may contain UNIX shell commands, system calls,
key strokes, audit events, and network packages used.
4. NSL-KDD dataset
The NSL-KDD [8] dataset is a refined version of its predecessor KDD99 dataset.
NSL-KDD dataset comprises close to 4,900,000 unique connection vectors, where
every connection vector consists of 41 features of which 34 are continuous features
and 07 are discrete features. Each vector is labeled as either normal or attack. There
are four major categories of attacks labeled in NSL-KDD: denial of service attack,
probing attack, users-to-root attack, and remote-to-local attack.
iii. Users-to-root attack (U2R): The attacker enters into the local system by
using the authorized credentials of the victim user and tries to exploit the
vulnerabilities to gain the administrator privileges. Examples of U2R attacks
are “load module,” “buffer overflow,” “rootkit,” and “perl.”
iv. Remote-to-local attack (R2L): The attackers access the targeted system or
network from the remote machine and try to gain the local access of the
victim machine. Examples of R2L attacks are “phf,” “warezmaster,”
“warezclient,” “spy,” “imap,” “ftp write,” “multihop,” and “guess passwd.”
Although many methods and systems have been developed by the research
community, there are still a number of open research issues and challenges. Some of
the research issues and challenges of AIDS are as follows:
7
Computer and Network Security
i. A network anomaly-based IDS should reduce the false alarm rate. But, totally
mitigating the false alarm is not possible. Developing an intrusion detection
system independent of the environment is another challenge task for the
network anomaly-based intrusion detection system development
community [9–13].
iii. When new patterns are identified in ANIDS, updating the database without
compromise of performance is another challenging task [9, 13].
v. Developing a suitable method for selecting the attributes for each category of
attack is another important task [9–11].
The preprocessed set of network transactions are partitioned based on its label-
ing (“normal” transactions as one set, “DoS” transactions as the other set and
similar other range of sets). Unique values of each feature value set f i vðNTSÞ in the
resultant normal transactions set (NTS) and its percentage of coverage are:
f i v ¼ f i ðv1 ; c1 Þ; f i ðv2 ; c2 Þ; f i ðv3 ; c3 Þ; f i ðv4 ; c4 Þ; ::…………; f i vj ; cj (1)
i. Consider the transactions set tsðAk Þ denoting attack type Ak (as an example
considers DoS as an attack).
ii. For every feature f i ðAk Þ, consider all the values as a set f i vðAk Þ. An empty set
f i v of size ∣f i vðAk Þ∣ is created and fills it based on its coverage as
∣f i vðAk Þ∣ ffi ∣f i v∣, in which ∣f i vðAk Þ∣ denotes the size of the feature values set
off i ðAk Þ.
iii. The process is used to generate the feature values vector f i v of the NTS, such
that f i v is compatible to the “f i vðAk Þ” toward size and that also represents
the coverage ratio of the values in f i vðNTSÞ.
iv. The process is applied for all feature values set in network transactions of
attack Ak .
8
Anomaly-Based Intrusion Detection System
DOI: http://dx.doi.org/10.5772/intechopen.82287
The approach for measuring the proposed feature association support ðfasÞ met-
ric considers the network transaction of the training dataset. The feature categorical
values used in the network transactions are in the form of two independent sets.
These values are used to develop a duplex graph between them.
7.1 Assumptions
Let f 1; f 2; f 3; :……fn∀f i ¼ f i v1 ; f i v2 ; :………; f i vm be the set of categorical fea-
tures values used for forming the set of network transactions T. Here T is a set of
network transaction records of the given training set such as:
T ¼ t1 ; t2 ; t3 ; :……tn ∀ti ¼ val f 1 , val f 2 , ::…val f i , val f iþ1 , ……val f n gg
(2)
Categorical values of the set of features related to every network transaction
shall be considered as transaction value set tvs and all transaction value sets are
treated as “STVS.”
In the description above in Eq. 2, val f i can be expressed as
val f i ∈ f i v1 ; f i v2 ; ……; f i vm . The term “feature” refers to the current categorical
value of the feature. The two features “val f i ” and “val f j ,” “val f i ” are
connected with “val f j ” if and only if val f i ; val f j ∈ tvsk .
ctvs
w val f 1 $ val f 2 ¼ (3)
∣STVS∣
Step 2: The edge weight between transaction value sets and its corresponding
set of feature categorical values can be measured as:
E¼ tvsi ; valj : valj ∈ tvsi ; tvsi ∈ STVS; valj ∈ v (4)
Step 3: Further assuming the transaction value sets of the given duplex graph as
pivots and the feature categorical values as pure prerogatives, the pivot and
prerogative values are measured.
9
Computer and Network Security
∣STVS∣
∑k¼1 uðtvsk Þ : f i vj ! tvsk 6¼ 0
fas f i vj ¼ ∣STVS∣
(5)
∑k¼1 uðtvsk Þ
Step 5: the Feature Association Impact Scale fais for every transaction value set
tvsi is estimated as:
∑m
j¼1 fas valj ∃valj ∈ V : valj ⊂tvsi
faisðtvsi Þ ¼ 1 (6)
∣tvsi ∣
Step 6: The Feature Association Impact Scale threshold faist can be measured as:
∣STVS∣
∑i¼1 faisðtvsi Þ
faist ¼ (7)
∣STVS∣
Step 8: The Feature Association Impact Scale range can be explored as Step 8.1
and Step 8.2:
Step 8.1: Calculate lower threshold of faist as faistl ¼ faist sdvfaist .
Step 8.2: Calculate higher threshold of faist as faisth ¼ faist þ sdvfaist .
The total number of records chosen for the test is 25% of the actual dataset, that
is, 34,361. The combination of test records chosen is from various categories such as
Probe, DoS, U2R, R2L, and Normal. The difference between CC average and stan-
dard deviation of CC is called as lower bound of CC threshold. The sum of CC
average and standard deviation of CC is called as upper bound of CC threshold.
The records that identified to be normal are 19.8% of the total test data records,
with observations of 4.7% of it as “false negatives” and 15.1% of it as “true nega-
tives.” The cumulative number of records that are detected as “intruded transac-
tions” is 80.2%, with 75.3% of them being “truly intruded transactions” of test data
records and the “false positive” percentage of 4.9% of test data records.
As per the results obtained, the proposed model is found to be accurate up to
90.4%. The experiments are conducted on the same dataset using “anomaly-based
network intrusion detection through assessing Feature Association Impact Scale
(FAIS)” [14]. The results depict that the proposed model is also scalable and
10
Anomaly-Based Intrusion Detection System
DOI: http://dx.doi.org/10.5772/intechopen.82287
effective for detecting the scope of intrusion from a network transaction. Despite
the fact that the FAIS model proposed shows 88% accuracy, the major limitation is
process complexity in training the system. Such process complexities of designing
the scale using FAIS are due to the number of features selected for assessing the
scale. The issue of selecting the optimal features for training the Intrusion
Detection System using Association Impact Scale is significantly addressed in the
FCAAIS [15] model.
Table 1 indicates the comparison of performance metrics such as precision,
recall/sensitivity, specificity, accuracy, and F-measure of FCAAIS over FAIS.
Figure 4 indicates that the accuracy of FCAAIS with optimal features is 91%,
whereas the FAIS accuracy with all features is 88%. The precision of the FCAAIS
model with optimal features and FAIS with all features is 92%. The other perfor-
mance metrics such as sensitivity, specificity, and F-measure is calculated on
FCAAIS over FAIS. The sensitivity, specificity, and F-measure are 96, 49, and 95%,
respectively, for FCAAIS, whereas sensitivity, specificity, and F-measure are 95, 46,
and 91%, respectively, for FAIS.
FCAAIS FAIS
Table 1.
Comparison of performance metrics of FCAAIS and FAIS.
Figure 4.
The performance metrics observed for FCAAIS over FAIS.
11
Computer and Network Security
According to the results, the accuracy of FCAAIS (selected feature set using
canonical correlation) minimized the process complexity of designing the scale
using FAIS (Figure 5 and Table 2).
The observed time complexity is adaptable, as the completion time is not
directly related to the ratio of features count, which is due to the higher CC thresh-
old as shown in Figure 6. Hence it is obvious to conclude that the applying canon-
ical correlation toward optimized attribute selection is significant improvement to
the FAIS model (shown in Figure 6).
It is observed that applying canonical correlation toward optimized attribute
selection results in 3% improvement in the accuracy of FAIS [14]. Table 3 indicates
precision, recall, and F-measure values calculated under divergent canonical corre-
lation threshold values (Figure 7).
Figure 5.
The process computational time observed for FCAAIS over FAIS.
Table 2.
Process computational time of FCAAIS and FAIS.
Figure 6.
The FCAAIS consumption of time under divergent canonical correlation thresholds.
12
Anomaly-Based Intrusion Detection System
DOI: http://dx.doi.org/10.5772/intechopen.82287
Table 3.
Precision, recall, and F-measure values calculated under divergent canonical correlation threshold.
Figure 7.
Performance analysis of the prediction accuracy of FCAAIS under divergent canonical correlation threshold
value.
9. Conclusion
13
Computer and Network Security
Author details
© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms
of the Creative Commons Attribution License (http://creativecommons.org/licenses/
by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,
provided the original work is properly cited.
14
Anomaly-Based Intrusion Detection System
DOI: http://dx.doi.org/10.5772/intechopen.82287
References
[2] Gong Y, Mabu S, Chen C, Wang Y, [10] Wagh SK, Pachghare VK, Kolhe SR.
Hirasawa K. Intrusion detection system Survey on intrusion detection system
combining misuse detection and using machine learning techniques.
anomaly detection using genetic International Journal of Computer
network programming. In: ICCAS-SICE. Applications (0975–8887). 2013;78(16):
2009 30-37
[7] Tsai C-F et al. Intrusion detection by [15] Jyothsna V, Rama Prasad VV.
machine learning: A review. Expert FCAAIS: Anomaly based network
Systems with Applications. 2009; intrusion detection through feature
36(10):11994-12000 correlation analysis and association
impact scale. ICT Express. 2016;2(3):
[8] Revathi S, Malathi DA. A detailed 103-116
analysis on NSLKDD dataset using
various machine learning techniques for
intrusion detection. International
Journal Engineering Research and
Technology (IJERT). Dec 2013;2(12)
15