Alt Albi
Alt Albi
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3347619
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2023.0322000
ABSTRACT Modern vehicles rely heavily on interconnected electronic control units (ECUs) through in-
vehicle networks to perform crucial functions such as braking and monitoring engine RPMs. However, the
increased number of ECUs and their connectivity to the in-vehicle network poses a security risk due to the
lack of encryption and authentication protocols such as the controller area network (CAN). To address this
problem, machine learning (ML) based intrusion detection systems (IDSs) have been proposed. However,
existing IDSs suffer from low detection accuracy, limited real-time response, and high resource requirements.
This study proposes an accurate and low-complexity IDS for in-vehicle networks based on feature fusion and
ensemble learning called the Feature Fusion and Stacking-based IDS (FFS-IDS). FFS-IDS fuses multiple
features extracted from raw network traffic and then classifies traffic instances into intrusive and non-
intrusive categories using a stacking ensemble learning of basic machine learning classifiers. Specifically,
a decision tree is employed as a base classifier, and random forest is used as a meta-learner. This work
implements and validates the FFS-IDS using real-time car hacking data sets and achieves better performance
than individual decision tree classifiers and popular ensemble learning methods such as Random Forest,
LightGBM, AdaBoost, and ExtraTree algorithms. The results demonstrate that FFS-IDS can detect Denial
of Service (DoS), Gear spoofing, and RPM spoofing attacks with up to 99% accuracy and Fuzzy attacks with
up to 97.5% accuracy using benchmark datasets. Overall, this study shows the effectiveness and practicality
of FFS-IDS in detecting intrusions in in-vehicle networks, which is essential for ensuring the cybersecurity
and safety of modern vehicles. Future work in this area could involve exploring additional feature extraction
techniques and fine-tuning hyperparameters to improve the performance of IDSs further.
INDEX TERMS Controller area network, In-vehicle network, Intrusion detection system, Feature fusion,
Ensemble learning, Car hacking
I. INTRODUCTION Intrusion detection systems (IDSs) are vital tools for protect-
ing in-vehicle networks by identifying unauthorized events
Modern vehicles are equipped with electronic control units
in the networks [8], [9]. Machine learning (ML) and deep
(ECUs) and robust computing systems, which have made
learning (DL) methods have been successfully implemented
them communication and computing-enabled terminals for
in developing effective IDSs for various applications [10]–
intra-vehicle and inter-vehicle network communication [1]–
[14].
[3]. This increased communication has led to more function-
ality and comfort, but it has also increased security threats However, most existing ML-based IDSs suffer from low
[4]–[6]. The susceptibility of the controller area network detection accuracy, limited real-time response, and limited
(CAN) to different types of cyber attacks, including fuzzy computing resources due to the availability of a large num-
attacks, DoS attacks, and spoofing attacks, has been a signifi- ber of features of network traffic [1], [15]–[17]. This work
cant concern due to the lack of encryption and authentication proposes an effective IDS (called FFS-IDS) for the CAN
policies in the de facto standard of in-vehicle networks [7]. bus of in-vehicle networks to address these issues. FFS-IDS
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3347619
involves feature fusion and stacking-based ensemble learning level features from basic packet header information, and ML
to detect intrusions in in-vehicle networks. It fuses multiple and DL-based methods often use this extracted information to
features derived from primary features extracted from raw classify network traffic accurately. However, these methods
network traffic, capturing more information about network may require significant computing resources and may not
activity and improving the IDS’s accuracy. It then classifies always provide high accuracy in network traffic classification
traffic instances into intrusive and non-intrusive categories [22].These in-vehicle intrusion detection approaches’ diverse
based on stacking ensemble learning of basic ML classifiers, strengths and limitations are compared in Table 1, allowing
where a traditional decision tree is used as a base classifier for a comprehensive understanding of their suitability for
and random forest is used as a meta-learner. different scenarios.
This work contributes to the field of in-vehicle network
security by proposing a feature fusion method that combines TABLE 1. Comparison of Network Traffic Classification Methods
basic features of in-vehicle network traffic to construct more
comprehensive data subsets. This approach captures a broader Method Effectiveness Limitations Resources Accuracy
Deep High for Specific High High
range of information about network activity, improving the packet known hardware, (specific
accuracy of intrusion detection. Additionally, I propose a inspection patterns multime- patterns)
stacking-based ensemble learning approach that further com- dia/encrypted
traffic issues
bines the outputs of multiple classifiers to improve detection Port Accurate for Modern Low Moderate
performance. Specifically, we use a Bayes classifier, decision number- traditional applications
tree, and random forest classifier in a hierarchical structure to based applications with non-
classifica- standard
learn from the comprehensive data subsets. Finally, I validate tion ports
the proposed methods using a real car hacking benchmark in- Statistical Effective High Variable Variable
trusion detection dataset for in-vehicle networks. The exper- information- with resources,
based ML/DL varying
imental results demonstrate that this approach significantly methods accuracy
outperforms existing state-of-the-art methods regarding de-
tection accuracy and false positive rate.
This paper is organized as follows. Section II provides an Several approaches have been proposed for detecting intru-
overview of existing research on intrusion detection in in- sions in in-vehicle network traffic using different techniques
vehicle networks. Section III presents the intrusion detec- and features. For instance, Alshammari et al. [23] employed
tion problem in in-vehicle networks. Section IV introduces K nearest neighbor and support vector machine-based classi-
the proposed FFS-IDS system based on feature fusion and fiers to detect intrusions in CAN bus traffic. Based on network
stacking-based ensemble learning for detecting intrusions in traffic specifications, Olufowobi et al. [24] developed a real-
in-vehicle network traffic. The experimental setup, includ- time IDS for in-vehicle network traffic attacks and evaluated
ing the benchmark dataset and performance metrics used to their system’s performance using a synthetic and CAN in-
evaluate the proposed system, is detailed in Section V. The trusion dataset. In addition, Olufowobi et al. [25] proposed
results of the experiments are presented and compared with an adaptive cumulative sum method that utilizes statistical
existing state-of-the-art methods in Section VI. Section VII change-based information to detect attacks in CAN traffic
highlights threat to validity of this work. Finally, Section VIII quickly. Barletta et al. [26] used distance-based information to
summarizes the contributions and discusses future directions develop IDSs for in-vehicle networks. They suggested using
for further research. the k-mean clustering algorithm with an X-Y fused Kohonen
network, which demonstrated high performance in detecting
II. RELATED WORK intrusion from the CAN dataset. However, their system has
There has been a growing interest in developing effective net- computational complexity. Lee et al. [27] developed an IDS
work traffic classification methods in academia and industry for detecting CAN attacks in in-vehicle network traffic us-
in recent years. Various techniques and features have been ing offset ratio and time interval-based information. They
proposed for this purpose, including deep packet inspection, demonstrated the performance of their model by simulating
port number-based classification, and statistical classification different types of attacks, such as Fuzzy attacks, DoS attacks,
methods [11], [18]–[21]. and impersonation attacks.
Deep packet inspection methods effectively identify known DL methods have also been explored for detecting attacks
patterns and classify network traffic based on payload in in-vehicle network traffic [28]. Song et al. [10] presented
content-based information. However, they require specific a deep convolutional neural network (CNN) based approach
hardware and have limitations in identifying multimedia- for detecting attacks in CAN traffic, which reported high
based and encrypted traffic. Port number-based classification attack detection accuracy. Similarly, Lo et al. [29] proposed
methods use transport layer headers’ port numbers to clas- using DL methods and different features, such as spatial and
sify network traffic accurately. However, they fail to classify temporal features, to detect intrusions in the car hacking
traffic from modern applications that do not use popular port dataset. They used a CNN for extracting spatial features and
numbers. Statistical information-based methods extract high- a long short term memory (LSTM) network for extracting
2 VOLUME 11, 2023
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3347619
temporal features, and the extracted features can correctly Despite their high performance, DL models can be compu-
classify network traffic of the car hacking dataset. tationally expensive due to their high complexity. Therefore,
Leveraging the taxonomy from Table 1 and highlighting the it is crucial to develop accurate IDSs for in-vehicle network
potential of ML/DL and statistical methods, Table 2 dissects traffic that utilize less computationally expensive ML models
the strengths and limitations of diverse approaches, features, and statistical features.
datasets, and results. However, direct comparisons remain
challenging due to individual study goals, data sources, and III. FORMULATING THE PROBLEM OF INTRUSION
evaluation criteria. DETECTION IN IN-VEHICLE NETWORKS
Overall, the studies discussed in this section have shown Intrusion detection in in-vehicle networks identifies abnormal
promise in detecting intrusions in in-vehicle network traffic events or attacks in the network traffic dataset. To frame this
using various methods [30]. However, each approach has its problem, we can define the following notations:
strengths and limitations. For example, DL-based methods Let DT = i1 , i2 , ..., iN be the set of N instances in the in-
such as CNN and LSTM networks have shown high accuracy vehicle network traffic dataset, where each instance repre-
in detecting intrusions, but they are computationally expen- sents m-dimensional feature space I. Thus, for an instance ij ,
sive due to their high complexity. On the other hand, statistical the features are denoted as ij = fi1 , fi2 , fi3 , ..., fim , and ij ∈ I .
methods such as the adaptive cumulative sum method and To perform intrusion detection, we need to define a map-
distance-based IDS have shown promise in detecting attacks ping ID that maps the input space I to an output space O,
quickly with less computational complexity. Still, they may indicating the number of classes for network traffic classifica-
not perform as well as DL-based methods. When choosing an tion. In binary classification, the output space consists of two
intrusion detection method for in-vehicle network traffic, it is classes, which can be denoted as O = intrusive, non-intrusive,
essential to consider the trade-off between accuracy and com- normal, anomaly, 0, 1, or positive, negative. In multi-class
putational complexity. Furthermore, more research is needed classification, the output space has more than two classes and
to evaluate the generalizability of these methods to different can be denoted as O = o1 , o2 , ...., oi , where i > 2.
datasets and their robustness to different types of attacks. This work aims to find a suitable mapping ID: I → O that
Based on the literature review presented and compared in classifies in-vehicle network traffic into attack classes based
Table 2, some research gaps can be identified: on a given network dataset. This study proposes a decision
• Limited research on statistical features: While some tree-based approach, alonure fusion, and stacking methods.
studies have explored statistical features, there is still a The proposed approach is discussed in detail in the following
lack of research on effectively utilizing them to develop section.
accurate and efficient IDSs for in-vehicle network traf- Strengths of this problem formulation include the precise
fic. definition of notations and the focus on identifying abnormal
• Lack of comparative studies: Although various tech- events in in-vehicle network traffic. Limitations of this for-
niques have been proposed for detecting intrusions in mulation include the lack of discussion on the types of attacks
in-vehicle network traffic, there is a lack of comparative that can occur in in-vehicle networks and the assumption that
studies that evaluate and compare the performance of the network dataset is already given.
these techniques. Comparative studies can help identify
the strengths and weaknesses of different methods and IV. DESIGN OF THE PROPOSED FEATURE FUSION AND
provide insights into which methods are most effective STACKING-BASED IDS (FFS-IDS)
in detecting intrusions in in-vehicle network traffic. This work proposes the Feature Fusion and Stacking based
• Limited research on resource-constrained environments: IDS (FFS-IDS) for in-vehicle networks, as shown in Figure
Many existing studies have focused on developing IDSs 1. The FFS-IDS leverages multiple features extracted from
for in-vehicle network traffic in resource-rich environ- raw network traffic to classify traffic instances into intrusive
ments without computing power or memory constraints. and non-intrusive categories using ensemble learning of basic
However, there is a lack of research on developing ef- ML classifiers in a stacking approach. The proposed system
fective IDSs for in-vehicle network traffic in resource- operates in three phases, which are described below.
constrained environments, such as those found in many
embedded systems. A. PHASE 1 - CONSTRUCTION OF THE BASIC DATA SET
• Lack of focus on new types of attacks: While the existing The first phase involves extracting basic features from raw
studies have proposed different approaches for detecting network traffic and constructing a benchmark dataset. These
various types of attacks, there is a need for more research features capture relevant information about network traffic,
on identifying and detecting new types of attacks that including spatial, temporal, and content features. Each record
may be specific to in-vehicle network traffic. As the in the dataset represents a network traffic instance in terms
automotive industry continues to evolve, attackers may of j features as described in Section III. The raw data may
develop new attack techniques specific to in-vehicle contain noise, missing, and non-uniform scale data values.
network traffic, and it is crucial to have IDSs that can Data preprocessing is applied to prepare the captured data
effectively detect such attacks. for further processing by ML models. This involves handling
VOLUME 11, 2023 3
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3347619
null values, removing noise, removing redundant and irrele- ever, this raw data cannot be directly used for ML purposes. A
vant information, and converting data attributes to a uniform CAN traffic preprocessor module is used to make it compati-
scale. ble with ML algorithms that require numeric input, as shown
To demonstrate the performance of the proposed FFS-IDS, in Figure 2.
we use the car-hacking dataset, which is available in CSV The preprocessing step involves encoding the ID field,
format and contains fields such as Timestamp, ID, DLC, which represents the unique code for each message sent on
D0-D7, and Tag. Table 3 describes the attributes of the car- the CAN bus, into a numeric value ranging from 0 to the
hacking dataset. maximum number of IDs (Max IDs). In the car-hacking
dataset, there are 29 unique CAN message IDs. The DLC
TABLE 3. Attributes of the car-hacking dataset. field, indicating the number of bytes sent over the network, is
normalized to a range of 0 to 1 using Eqs. 1 and 2 [33]. Here,
Attribute Description
Timestamp The recorded time of attack message
vali is the initial feature value i, while Mini and Maxi are the
ID Identifier of CAN message in Hex minimum and maximum values of feature i, respectively [34].
DLC The number of data bytes sent on the network
varies from 0 to 8
D0-D7 Data bytes sent over the network Normalizedvaluei = normalize(ln(vali + 1)) (1)
Tag Tag of the message as normal (R) or attack (T)
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3347619
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3347619
O(1). Therefore, the overall computational complexity of the algorithms as base learning and meta-learning algorithms,
proposed algorithm in terms of space and time is O(N ). respectively. The experiments were conducted on a machine
with an Intel Core I3-2330M CPU @ 2.20 GHz, 4 GB RAM,
Algorithm 1 FFS-IDS Algorithm and 1 TB HDD running on the Windows operating system.
Require: DTi : ith Basic data subset, BA: Basic learning al- The results are recorded for FFS-IDS and the identified algo-
gorithm, BM: Basic learning model, ML: Meta learning rithms, DT, RF, LightGBM and ExtraTree methods.
algorithm To ensure fair comparisons, we used the default hyper-
Ensure: Network traffic class predictions of the final meta- parameters of the identified algorithms as defined in the
learning model (FM) Scikit-learn library, as presented in Tables 4 - 8. I also utilized
1: Extract basic features from raw network traffic data commonly used performance metrics to evaluate the effec-
2: Construct comprehensive data subsets from basic fea- tiveness of the proposed FFS-IDS approach. I compared the
tures using a combination and permutation approach results with existing approaches for detecting intrusions in
3: Set i ← 1. in-vehicle networks. Furthermore, I analyzed the results and
4: while i ≤ N do highlighted significant observations from the comparative
BMi = BA (DTi ) analysis.
5: end while
6: DT 0 ← NULL TABLE 4. Hyper-parameters of Decision Tree classifier.
7: Set i ← 1.
Parameter Description Default Value
8: while i ≤ N do
criterion Splitting criterion "gini" (Gini impurity)
9: Set j ← 1. splitter Split strategy "best" (chooses best
10: while j ≤ N do split)
max_depth Maximum depth of tree None (no limit)
11: ENij = BMj (DTj ) min_samples_leaf Minimum number of 1
12: DT 0 + = y (ENij ) samples per leaf
13: end while min_samples_split Minimum number of 2
samples required for a
14: end while
split
15: FM ← y (DT 0 ) max_features Number of features to "auto" (all features)
16: Return FM consider at each split
random_state Seed for RNG None
verbose Logging level 0 (no logging)
The proposed system’s overall effectiveness and accuracy
depend significantly on the accuracy of the base models.
To ensure high performance and computational efficiency,
TABLE 5. Hyper-parameters of Random Forest classifier.
I carefully selected the decision tree algorithm as the base
learning algorithm and the random forest algorithm as the Parameter Description Default Value
meta-learning algorithm. The decision tree algorithm is well- n_estimators Number of trees in the 100
suited for classification tasks, providing highly accurate re- forest
max_depth Maximum depth of a None (no limit)
sults with minimal computations [35], [36]. On the other tree
hand, the random forest algorithm is known to achieve high min_samples_split Minimum number of 2
classification accuracy through multiple decision trees, even samples required for
split
in the presence of noise and overfitting issues [36], [37]. min_samples_leaf Minimum number of 1
samples required in a
V. EXPERIMENTAL SETUP AND IMPLEMENTATION leaf
This section presents a comprehensive overview of the exper- bootstrap Whether to use boot- True
strap sampling
imental setup, implementation, dataset, performance metrics, max_features Number of features to ‘auto‘ (sqrt of total fea-
and results of the proposed approach for detecting intrusions consider at each split tures)
in in-vehicle networks. I also provide a detailed analysis of the oob_score Whether to compute False
out-of-bag scores
results and highlight significant observations from the com- random_state Seed for the random None
parative analysis with the outcomes of existing approaches. number generator
verbose Logging level 0
A. EXPERIMENTAL SETUP
To implement the proposed FFS-IDS, and state of the art
methods including DT, RF, LightGBM and ExtraTree meth- B. BENCHMARK DATASET
ods, I utilized the Anaconda distribution of Python and To assess the performance of the proposed FFS-IDS system
various libraries such as Pandas, Numpy, and Scikit-learn in detecting intrusions in in-vehicle networks, we utilized the
for loading the dataset, performing pre-processing opera- car hacking dataset introduced in [38]. This dataset comprises
tions, constructing a comprehensive feature combination- in-vehicle network traffic data recorded by ECUs over the
based data subset, and using decision tree and random forest CAN bus and includes normal and attack traffic. The dataset
6 VOLUME 11, 2023
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3347619
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3347619
TABLE 10. Training and test datasets. 4) Precision (PR): It is the fraction of data instances pre-
dicted as positive that are actually positive.
Dataset Type Training Test
Attack 393964 193557 TP
DoS data subset instances PR = (6)
Normal 2062102 1016148
Attack 329502 162345 TP + FP
Fuzzy data subset instances
Normal 2242534 1104479
Attack 400719 196533
5) F-measure (FM): For a given threshold, the FM is
Gear Spoofing data subset instances the harmonic mean of the precision and recall at that
Normal 2576186 1269704
RPM Spoofing data subset instances
Attack 438245 216652 threshold.
Normal 2658295 1308510 2
FM = 1 1 (7)
PR + Recall
TABLE 11. Confusion matrix elements.
VI. RESULTS AND DISCUSSION
Sr No. Element Definition
1. True Negative (TN) Normal/non-intrusive This study compares the performance of the proposed FFS-
behavior that is IDS system with other commonly used classifiers, namely the
successfully labeled as decision tree [44], random forest [45], LightGBM [46], Ad-
normal/non-intrusive
by the IDS. aBoost [47], and ExtraTree [48], which are ensemble learning
2. True positive (TP) Intrusions that are suc- methods. The evaluation of these methods is based on the car
cessfully detected by hacking dataset.
the IDS.
3. False positive (FP) Normal/non-intrusive I conducted ten independent experiments using FFS-IDS
behavior that is and the other classifiers with their default hyperparameters.
wrongly classified as
intrusive by the IDS.
The performance of these classifiers was evaluated using
4. False Negative (FN) Intrusions that are commonly used performance metrics. To compare the results
missed by the IDS, of these experiments, I visually represented the experimental
and classified as
normal/non-intrusive.
results using Figure 3 – 6.
Actual Predicted
Normal Attack
Normal TN FP
Attack FN TP
instances.
Correctly_classified_instances
CR =
Total_number_of _instances
TP + TN
= (3)
TP + TN + FP + FN
2) Detection rate or Recall: It is computed as the ratio
between the number of correctly detected attacks and
the total number of attacks.
Correctly_detected_attacks
DR =
Total_number_of _attacks
TP
= (4)
TP + FN
3) False positive rate (FPR): It is defined as the ratio
between the number of normal instances detected as
attack and the total number of normal instances. FIGURE 3. Comparison of the Detection Performance of the FFS-IDS
System and Other State-of-the-art Methods on the DoS Attack Dataset.
Number_of _normal_instances_detected_as_attacks
FPR =
Total_number_of _normal_instances
Tables 13 – 16 summarize the comparative analysis of the
FP proposed FFS-IDS system with the existing approaches for
= (5) detecting intrusion in in-vehicle networks based upon car
FP + TN
8 VOLUME 11, 2023
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3347619
FIGURE 4. Comparison of the Detection Performance of the FFS-IDS FIGURE 5. Comparison of the Detection Performance of the FFS-IDS
System and Other State-of-the-art Methods on the Fuzzy Attack Dataset. System and Other State-of-the-art Methods on the Gear spoofing Attack
Dataset.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3347619
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3347619
• Threats to Internal Validity: The experimental de- Additionally, a detailed analysis of resource consumption
sign involves using default hyperparameters for machine will be conducted, including memory usage, CPU and GPU
learning classifiers, which could influence the internal utilization, and network bandwidth requirements. The authors
validity as the chosen parameters may not be optimal aim to compare the resource consumption of their approach
for the specific characteristics of the dataset. Moreover, with existing IDS solutions, evaluating its feasibility for real-
the proposed FFS-IDS algorithm’s performance is eval- world deployment.
uated based on a specific configuration, and changes in The optimization focus includes exploring various tech-
the dataset or algorithmic parameters might impact the niques to enhance the efficiency of the FFS-IDS algorithm.
results. This involves investigating alternative feature fusion meth-
• Threats to Construct Validity: The feature extraction ods, optimizing data subset construction algorithms, and
techniques employed in this study focus on specific exploring lightweight stacking classifier architectures. The
aspects of network traffic. Variations in network archi- overarching goal is to reduce execution time and resource
tectures or the introduction of new attack methodolo- consumption while maintaining or improving the detection
gies might threaten the construct validity, as the chosen accuracy of the system.
features may not comprehensively cover all potential Moreover, the paper proposes investigating hardware-
intrusions. specific adaptations, tailoring the FFS-IDS algorithm for plat-
• Threats to Conclusion Validity: The conclusions forms like embedded devices or edge computing environ-
drawn from the results are based on the specific dataset, ments. This involves developing specialized implementations
experimental setup, and evaluation metrics chosen. that leverage the strengths of available hardware resources
Changes in any of these elements or introducing new while minimizing constraints.
metrics could potentially alter the conclusions drawn
from this study. REFERENCES
[1] Zixiang Bi, Guoai Xu, Guosheng Xu, Miaoqing Tian, Ruobing Jiang, and
VIII. CONCLUSIONS AND FUTURE WORK Sutao Zhang. Intrusion detection method for in-vehicle can bus based on
The increasing number of ECUs in modern vehicles has led to message and time transfer matrix. Security and Communication Networks,
2022, 2022.
an increasingly connected internal network, the CAN, which [2] Joshua E Siegel, Dylan C Erb, and Sanjay E Sarma. A survey of the
has made them vulnerable to malicious attacks. This work connected vehicle landscapeŮarchitectures, enabling technologies, appli-
proposed an effective IDS for in-vehicle networks called FFS- cations, and development areas. IEEE Transactions on Intelligent Trans-
portation Systems, 19(8):2391–2406, 2017.
IDS, which uses feature fusion and stacking-based ensemble [3] Wooyeon Jo, SungJin Kim, Hyunjin Kim, Yeonghun Shin, and Taeshik
learning. FFS-IDS fuses multiple features extracted from raw Shon. Automatic whitelist generation system for ethernet based in-vehicle
network traffic and classifies traffic instances into intrusive network. Computers in Industry, 142:103735, 2022.
and non-intrusive categories based on stacking ensemble [4] Jiajia Liu, Shubin Zhang, Wen Sun, and Yongpeng Shi. In-vehicle network
attacks and countermeasures: Challenges and future directions. IEEE
learning of basic ML classifiers. Network, 31(5):50–58, 2017.
The experimental results demonstrated that FFS-IDS out- [5] Stephen Checkoway, Damon McCoy, Brian Kantor, Danny Anderson,
performed state-of-the-art IDSs in terms of detection perfor- Hovav Shacham, Stefan Savage, Karl Koscher, Alexei Czeskis, Franziska
Roesner, and Tadayoshi Kohno. Comprehensive experimental analyses of
mance, achieving detection accuracies of up to 99% for DoS, automotive attack surfaces. In 20th USENIX Security Symposium (USENIX
Gear spoofing, and RPM spoofing attacks, and up to 97.5% Security 11), 2011.
for Fuzzy attacks on the car hacking benchmark dataset. This [6] Wei Lo, Hamed Alqahtani, Kutub Thakur, Ahmad Almadhor, Subhash
Chander, and Gulshan Kumar. A hybrid deep learning based intrusion de-
research demonstrates the effectiveness and practicality of tection system using spatial-temporal representation of in-vehicle network
FFS-IDS for detecting intrusions in in-vehicle networks. traffic. Vehicular Communications, 35:100471, 2022.
The future work outlined in the paper encompasses ad- [7] Li Yang and Abdallah Shami. A transfer learning and optimized cnn
based intrusion detection system for internet of vehicles. arXiv preprint
dressing identified limitations and enhancing the proposed arXiv:2201.11812, 2022.
FFS-IDS for in-vehicle networks. The paper acknowledges [8] Md Delwar Hossain, Hiroyuki Inoue, Hideya Ochiai, Doudou Fall, and
the constraints of using a single dataset for evaluation and Youki Kadobayashi. An effective in-vehicle can bus intrusion detection
system using cnn deep learning approach. In GLOBECOM 2020-2020
default hyperparameters for machine learning classifiers. To IEEE Global Communications Conference, pages 1–6. IEEE, 2020.
overcome these limitations, additional feature extraction tech- [9] Muhammad Alolaiwy, Murat Tanik, and Leon Jololian. From cnns to
niques can be explored to enhance the detection performance adaptive filter design for digital image denoising using reinforcement q-
learning. In SoutheastCon 2021, pages 1–8. IEEE, 2021.
of IDSs. Furthermore, the intention is to fine-tune the hyper-
[10] Hyun Min Song, Jiyoung Woo, and Huy Kang Kim. In-vehicle network
parameters of base algorithms, ensuring a more robust and intrusion detection using deep convolutional neural network. Vehicular
accurate IDS. Communications, 21:100198, 2020.
The future research directions involve empirical validation [11] Li Yang, Abdallah Moubayed, Abdallah Shami, Parisa Heidari, Amine
Boukhtouta, Adel Larabi, Richard Brunner, Stere Preda, and Daniel Mi-
and optimization of the FFS-IDS algorithm. The authors pro- gault. Multi-perspective content delivery networks security framework
pose conducting thorough experiments to measure the execu- using optimized unsupervised anomaly detection. IEEE Transactions on
tion time on diverse hardware configurations, analyzing each Network and Service Management, 2021.
[12] Geeta Kocher and Gulshan Kumar. Machine learning and deep learning
algorithmic stage’s time consumption. Scalability concern- methods for intrusion detection systems: recent developments and chal-
ing dataset size and complexity will be rigorously assessed. lenges. Soft Computing, 25(15):9731–9763, 2021.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3347619
[13] Rajinder Kaur, Monika Sachdeva, and Gulshan Kumar. Nature inspired [33] G. Kumar and K. Kumar. Ai based supervised classifiers: an analysis for
feature selection approach for effective intrusion detection. Indian journal intrusion detection. In Proc. of International Conference on Advances in
of science and technology, 9(42):1–9, 2016. Computing and Artificial Intelligence, pages 170–174. ACM, 2011.
[14] Sampath Rajapaksha, Harsha Kalutarage, M Omar Al-Kadri, Andrei Petro- [34] C. Elkan. Results of the kdd’99 classifier learning. ACM SIGKDD
vski, Garikayi Madzudzo, and Madeline Cheah. Ai-based intrusion detec- Explorations Newsletter, 1(2):63–64, 2000.
tion systems for in-vehicle networks: A survey. ACM Computing Surveys, [35] Mohamed Amine Ferrag, Leandros Maglaras, Ahmed Ahmim, Makhlouf
55(11):1–40, 2023. Derdour, and Helge Janicke. Rdtids: Rules and decision tree-based in-
[15] Araya Kibrom Desta, Shuji Ohira, Ismail Arai, and Kazutoshi Fujikawa. trusion detection system for internet-of-things networks. Future internet,
Rec-cnn: In-vehicle networks intrusion detection using convolutional neu- 12(3):44, 2020.
ral networks trained on recurrence plots. Vehicular Communications, [36] Hao Zhang, Jie-Ling Li, Xi-Meng Liu, and Chen Dong. Multi-dimensional
35:100470, 2022. feature fusion and stacking ensemble mechanism for network intrusion
[16] Samira Tahajomi Banafshehvaragh and Amir Masoud Rahmani. Intrusion, detection. Future Generation Computer Systems, 122:130–143, 2021.
anomaly, and attack detection in smart vehicles. Microprocessors and [37] XuKui Li, Wei Chen, Qianru Zhang, and Lifa Wu. Building auto-encoder
Microsystems, 96:104726, 2023. intrusion detection system based on random forest feature selection. Com-
[17] Yijie Xun, Zhouyan Deng, Jiajia Liu, and Yilin Zhao. Side channel puters & Security, 95:101851, 2020.
analysis: A novel intrusion detection system based on vehicle voltage [38] Huy Kang Kim. Car-hacking dataset, 2021.
signals. IEEE Transactions on Vehicular Technology, 2023. [39] Eunbi Seo, Hyun Min Song, and Huy Kang Kim. Gids: Gan based intrusion
[18] Yoga Durgadevi Goli and R Ambika. Network traffic classification detection system for in-vehicle network. In 2018 16th Annual Conference
techniques-a review. In 2018 International Conference on Computational on Privacy, Security and Trust (PST), pages 1–6. IEEE, 2018.
Techniques, Electronics and Mechanical Systems (CTEMS), pages 219– [40] Kutub Thakur, Hamed Alqahtani, and Gulshan Kumar. An intelligent al-
222. IEEE, 2018. gorithmically generated domain detection system. Computers & Electrical
[19] Rajinder Kaur, Monika Sachdeva, and Gulshan Kumar. Study and compar- Engineering, 92:107129, 2021.
ison of feature selection approaches for intrusion detection. International [41] Gulshan Kumar. An improved ensemble approach for effective intrusion
Journal of Computer Applications, 975:8887, 2016. detection. The Journal of Supercomputing, 76(1):275–291, 2020.
[20] K Kumar Sheena and Gulshan Kumar. Analysis of feature selection [42] Gulshan Kumar. Evaluation metrics for intrusion detection systems-a
techniques: A data mining approach. In IJCA Proceedings on International study. Evaluation, 2(11):11–7, 2014.
Conference on Advances in Emerging Technology, pages 17–21, 2016. [43] Hamed Alqahtani, Manolya Kavakli-Thorne, Gulshan Kumar, and Fer-
[21] Thuy TT Nguyen and Grenville Armitage. A survey of techniques for in- ozepur SBSSTC. An analysis of evaluation metrics of gans. In Interna-
ternet traffic classification using machine learning. IEEE communications tional Conference on Information Technology and Applications (ICITA),
surveys & tutorials, 10(4):56–76, 2008. 2019.
[22] Muhammad Shafiq, Xiangzhan Yu, Asif Ali Laghari, Lu Yao, Nabin Kumar [44] Anthony J Myles, Robert N Feudale, Yang Liu, Nathaniel A Woody, and
Karn, and Foudil Abdessamia. Network traffic classification techniques Steven D Brown. An introduction to decision tree modeling. Journal of
and comparative analysis using machine learning algorithms. In 2016 Chemometrics: A Journal of the Chemometrics Society, 18(6):275–285,
2nd IEEE International Conference on Computer and Communications 2004.
(ICCC), pages 2451–2455. IEEE, 2016. [45] Mahesh Pal. Random forest classifier for remote sensing classification.
[23] Abdulaziz Alshammari, Mohamed A Zohdy, Debatosh Debnath, and International journal of remote sensing, 26(1):217–222, 2005.
George Corser. Classification approach for intrusion detection in vehicle [46] Altyeb Altaher Taha and Sharaf Jameel Malebary. An intelligent approach
systems. Wireless Engineering and Technology, 9(4):79–94, 2018. to credit card fraud detection using an optimized light gradient boosting
[24] Habeeb Olufowobi, Clinton Young, Joseph Zambreno, and Gedare Bloom. machine. IEEE Access, 8:25579–25587, 2020.
Saiducant: Specification-based automotive intrusion detection using con- [47] Dragos D Margineantu and Thomas G Dietterich. Pruning adaptive boost-
troller area network (can) timing. IEEE Transactions on Vehicular Tech- ing. In ICML, volume 97, pages 211–218. Citeseer, 1997.
nology, 69(2):1484–1494, 2019. [48] Pierre Geurts, Damien Ernst, and Louis Wehenkel. Extremely randomized
[25] Habeeb Olufowobi, Uchenna Ezeobi, Eric Muhati, Gaylon Robinson, Clin- trees. Machine learning, 63(1):3–42, 2006.
ton Young, Joseph Zambreno, and Gedare Bloom. Anomaly detection
approach using adaptive cumulative sum algorithm for controller area net-
work. In Proceedings of the ACM Workshop on Automotive Cybersecurity,
pages 25–30, 2019.
[26] Vita Santa Barletta, Danilo Caivano, Antonella Nannavecchia, and Michele
Scalera. A kohonen som architecture for intrusion detection on in-vehicle
communication networks. Applied Sciences, 10(15):5062, 2020.
[27] Hyunsung Lee, Seong Hoon Jeong, and Huy Kang Kim. Otids: A novel
intrusion detection system for in-vehicle network by using remote frame. In
2017 15th Annual Conference on Privacy, Security and Trust (PST), pages
57–5709. IEEE, 2017.
[28] Junchao Xiao, Lin Yang, Fuli Zhong, Hongbo Chen, and Xiangxue Li.
ALI ALTALBE received his PhD in Information
Robust anomaly-based intrusion detection system for in-vehicle network
Technology from The University of Queensland
by graph neural network framework. Applied Intelligence, 53(3):3183–
3206, 2023. in Australia, completed his MSc in Information
[29] Wei Lo, Hamed Alqahtani, Kutub Thakur, Ahmad Almadhor, Subhash Technology from Flinders University, Australia.
Chander, and Gulshan Kumar. A hybrid deep learning based intrusion de- He is currently working as associate professor in
tection system using spatial-temporal representation of in-vehicle network the Department of IT.
traffic. Vehicular Communications, page 100471, 2022.
[30] Hamed Alqahtani and Gulshan Kumar. A deep learning-based intrusion
detection system for in-vehicle networks. Computers and Electrical Engi-
neering, 104:108447, 2022.
[31] Siti Farhana Lokman, Abu Talib Othman, Muhamad Husaini Abu Bakar,
and Rizal Razuwan. Stacked sparse autoencodersbased outlier discov-
ery for in-vehicle controller area network (can). Int. J. Eng. Technol,
7(4.33):375–380, 2018.
[32] Javed Ashraf, Asim D Bakhshi, Nour Moustafa, Hasnat Khurshid, Ab-
dullah Javed, and Amin Beheshti. Novel deep learning-enabled lstm
autoencoder architecture for discovering anomalous events from intelligent
transportation systems. IEEE Transactions on Intelligent Transportation
Systems, 22(7):4507–4518, 2020.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/