0% found this document useful (0 votes)
30 views13 pages

Electronics: Machine Learning-Based Anomaly Detection For Securing In-Vehicle Networks

Uploaded by

anila kousar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views13 pages

Electronics: Machine Learning-Based Anomaly Detection For Securing In-Vehicle Networks

Uploaded by

anila kousar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

electronics

Article
Machine Learning-Based Anomaly Detection for Securing
In-Vehicle Networks
Asma Alfardus * and Danda B. Rawat *

Data Science and Cybersecurity Center (DSC2), Department of Electrical Engineering and Computer Science,
Howard University, Washington, DC 20059, USA
* Correspondence: [Link]@[Link] (A.A.); [Link]@[Link] (D.B.R.)

Abstract: In-vehicle networks (IVNs) are networks that allow communication between different
electronic components in a vehicle, such as infotainment systems, sensors, and control units. As these
networks become more complex and interconnected, they become more vulnerable to cyber-attacks
that can compromise safety and privacy. Anomaly detection is an important tool for detecting
potential threats and preventing cyber-attacks in IVNs. The proposed machine learning-based
anomaly detection technique uses deep learning and feature engineering to identify anomalous
behavior in real-time. Feature engineering involves selecting and extracting relevant features from
the data that are useful for detecting anomalies. Deep learning involves using neural networks
to learn complex patterns and relationships in the data. Our experiments show that the proposed
technique have achieved high accuracy in detecting anomalies and outperforms existing state-of-the-
art methods. This technique can be used to enhance the security of IVNs and prevent cyber-attacks
that can have serious consequences for drivers and passengers.

Keywords: IVNs; anomaly detection; cybersecurity; machine learning; deep learning; feature engineering

1. Introduction
The automotive industry has undergone a significant transformation in recent years
Citation: Alfardus, A.; Rawat, D.B. due to the proliferation of in-vehicle networks (IVNs) and their integration with other
Machine Learning-Based Anomaly technologies such as sensors, actuators, and controllers. The resulting complexity and
Detection for Securing In-Vehicle interconnections of these systems have created new challenges for cybersecurity, as IVNs are
Networks. Electronics 2024, 13, 1962. increasingly targeted by malicious actors seeking to exploit their vulnerabilities. According
[Link] to a report by the National Highway Traffic Safety Administration (NHTSA), the average
electronics13101962 number of cyber incidents involving automobiles has increased significantly in the last
Academic Editor: Aryya few years, highlighting the need for effective cybersecurity measures in the automotive
Gangopadhyay industry [1]. Anomaly detection is an essential tool for detecting potential threats and
preventing cyber-attacks in IVNs. Anomaly detection techniques aim to identify patterns of
Received: 15 February 2024 behavior that are outside the norm and may indicate malicious activity. Traditional methods
Revised: 31 March 2024
of anomaly detection in IVNs have focused on rule-based or statistical techniques. However,
Accepted: 15 April 2024
these methods have limitations in terms of their ability to handle the high-dimensional
Published: 16 May 2024
and dynamic nature of IVN data. Machine learning has shown great potential in various
applications, including cybersecurity. In recent years, machine learning-based anomaly
detection techniques have been applied to a wide range of cybersecurity domains, including
Copyright: © 2024 by the authors.
network intrusion detection, malware detection, and fraud detection [2]. The success of
Licensee MDPI, Basel, Switzerland. these techniques in detecting anomalies in such domains has led to their application in
This article is an open access article IVNs for cybersecurity.
distributed under the terms and The need for effective cybersecurity measures in IVNs has been recognized by the
conditions of the Creative Commons automotive industry, and several initiatives have been launched to improve the security of
Attribution (CC BY) license (https:// IVNs. For example, the Auto-ISAC (Automotive Information Sharing and Analysis Center)
[Link]/licenses/by/ was established in 2015 to share information and best practices related to cybersecurity in
4.0/). the automotive industry [3].

Electronics 2024, 13, 1962. [Link] [Link]


Electronics 2024, 13, 1962 2 of 13

Yoshizawa et al. [4] address the growing interest in Vehicle-to-Everything (V2X) com-
munication from both industry and academia, highlighting the focus on pilot projects
testing its capabilities and feasibility. With the European Union responsible for half of the
global road vehicle exports and under its stringent security and data protection laws, it
is crucial for V2X initiatives to integrate security and privacy considerations alongside
road safety. However, findings from a survey of relevant standards, research outputs, and
EU pilot projects reveal significant security and privacy issues and inconsistencies across
the standards.
Several feature engineering techniques have been proposed for IVN anomaly detection,
including wavelet transform [5] and principal component analysis (PCA) [6]. In general,
IVNs are becoming increasingly complex and interconnected, making them vulnerable
to cyber-attacks. Anomaly detection is a crucial tool for identifying potential threats and
preventing cyber-attacks in IVNs. Machine learning-based anomaly detection techniques
offer a promising solution to this problem by leveraging the power of machine learning to
learn complex patterns and relationships in the data. The key challenge in using machine
learning for anomaly detection in IVNs is the need to extract relevant features from the raw
data. The IVNs architecture is shown in Figure 1.

Figure 1. IVNs architecture with electronic control units and buses.

The increasing complexity and inter-connectivity of in-vehicle networks (IVNs) ne-


cessitate robust cybersecurity measures to safeguard against potential threats. Leveraging
machine learning-based anomaly detection techniques holds promise in mitigating these
risks by enabling the identification of anomalous behavior in real-time. In this paper,
we propose a novel machine learning-based anomaly detection technique for IVNs that
combines deep learning and feature engineering to detect anomalous behavior in real-
time. Our approach addresses the limitations of existing methods by using a combination
of feature engineering and deep learning to extract relevant features and learn complex
patterns in the data. We focus on using recurrent neural networks (RNNs) for anomaly
detection in IVNs, as they are well-suited for handling time-series data and can capture
long-term dependencies in the data. We also incorporate a wavelet transform-based feature
extraction technique to reduce the dimensionality of the data and improve the performance
of the RNN. Our paper makes several significant contributions to the field of in-vehicle
network security. Firstly, we propose a novel machine learning-based anomaly detection
technique specifically tailored for IVNs, leveraging deep learning and feature engineering
to enhance detection accuracy. Secondly, we conduct comprehensive experiments using
Electronics 2024, 13, 1962 3 of 13

real-world IVN data, demonstrating the effectiveness and superiority of our proposed
methodology over existing state-of-the-art techniques. Thirdly, we provide insights into
the importance of feature engineering in conjunction with deep learning, highlighting its
crucial role in addressing the unique challenges posed by IVN data. Overall, our research
offers valuable advancements in IVN security, paving the way for more robust and efficient
anomaly detection systems in automotive environments. The paper is structured as follows:
Section 2 provides a detailed description of the related work in the field. In Section 3, the
proposed methodology is presented, along with the subsequent experiments conducted to
evaluate its effectiveness. The results of these experiments are reported in Section 4. Finally,
Section 5 concludes the paper.

2. Related Work
The field of anomaly detection has been extensively studied in recent years, with
numerous approaches proposed for detecting anomalies in various domains, including
cybersecurity. Rule-based approaches involve specifying a set of rules that define nor-
mal behavior in the IVN and detecting anomalies based on deviations from these rules,
and while rule-based approaches can be effective in detecting known attacks, they have
limitations in terms of their ability to handle unknown or evolving attacks. A rule-based
approach proposed by Pires et al. [7] used a set of rules based on the expected frequency of
messages in the IVN to detect anomalies. However, this approach may not be effective in
detecting sophisticated attacks that involve subtle changes in the frequency or timing of
messages. Statistical approaches involve modeling the distribution of the normal behavior
in the IVN and detecting anomalies based on deviations from this distribution. These
approaches are widely used in anomaly detection and have been applied to IVNs as well.
Wang et al. [8] proposed a statistical approach based on a Gaussian mixture model (GMM)
to detect anomalies in the IVN. However, statistical approaches have limitations in terms
of their ability to handle the high-dimensional and dynamic nature of IVN data. Machine
learning-based approaches involve training a model on a set of labeled data to learn the
patterns and relationships in the data and detecting anomalies based on deviations from the
learned model. These approaches have shown great promise in detecting anomalies in IVNs
due to their ability to learn complex patterns and relationships in the data. Chen et al. [9]
proposed a machine learning-based approach that combines PCA and the support vector
machine (SVM) for anomaly detection in IVNs. Their approach achieved high accuracy
and outperformed traditional statistical approaches.
Hybrid approaches combine multiple techniques, such as rule-based, statistical, and
machine learning-based approaches, to improve the accuracy and robustness of anomaly
detection. For example, Li et al. [10] proposed a hybrid approach that combines wavelet
transform and LSTM (long short-term memory) for anomaly detection in IVNs. Their
approach achieved high accuracy and outperformed traditional statistical approaches.
The evaluation of anomaly detection techniques in IVNs requires appropriate metrics to
measure the performance of the techniques. Several evaluation metrics have been proposed
in the literature, including accuracy, precision, recall, F1 score, and area under the curve
(AUC) of the receiver operating characteristic (ROC) curve. These metrics can be used to
compare the performance of different techniques and to evaluate the trade-off between
detection rate and false positive rate. Another approach to anomaly detection in IVNs is
clustering-based methods, which aim to group similar data points together and identify
anomalies as data points that do not belong to any cluster. Zhang et al. [11] proposed a
clustering-based method that uses a density-based clustering algorithm to group data points
and identify anomalies based on their distance from the cluster centers. The method was
tested on a dataset of network traffic in a vehicle and achieved high accuracy in detecting
anomalies. However, the method has limitations in handling complex data and requires
careful selection of clustering parameters. In recent years, deep learning-based methods
have gained popularity in anomaly detection due to their ability to automatically extract
relevant features from the raw data. Zong et al. [12] proposed a deep auto-encoder-based
Electronics 2024, 13, 1962 4 of 13

method for anomaly detection in IVNs, which uses an unsupervised learning approach to
learn a compact representation of the input data. The method was evaluated on a dataset of
network traffic in a vehicle and achieved high accuracy in detecting anomalies. However,
the method has limitations in handling imbalanced data and requires a large amount of
training data.
Wang et al. [13] discuss the increasing risk of remote wireless attacks on in-vehicle
networks due to advancements in 5G and Internet of Vehicles (IoV) technologies. As a
protective measure, the authors propose a novel distributed anomaly detection system
tailored for the vehicular controller area network (CAN) bus, employing Hierarchical
Temporal Memory (HTM) technology. HTM is utilized to predict real-time network flow
data, relying on previously learned states to enhance detection accuracy. To advance the
system’s effectiveness, the authors also refine the mechanism used to score anomalies.
They conduct experiments involving synthetic data modifications and replay attacks to
validate their system. The performance of the HTM-based system is compared against
traditional detection models using recurrent neural networks (RNNs) and hidden Markov
models (HMMs). The results demonstrate that the HTM-based anomaly detection system
outperforms the others, particularly in metrics such as the area under the receiver operating
characteristic (ROC) curve, precision, and recall, indicating a significant improvement in
identifying and responding to anomalies in vehicle network security.

3. Proposed Approach
In this section, we have proposed methodology for anomaly detection in IVNs based
on a combination of deep learning and feature engineering. Feature engineering involves
creating new features or transforming existing features to improve the performance of
a machine learning model. It can include operations such as scaling, normalization etc.,
aimed at making the data more suitable for the model. In domains with complex and
high-dimensional data such as in-vehicle networks, feature engineering plays a crucial
role alongside deep learning models. Despite the automatic feature learning capabilities of
deep learning, feature engineering remains indispensable due to its ability to incorporate
domain knowledge and optimize input data for the learning task and model capabilities.
Specifically, feature engineering allows researchers to identify and extract relevant features
that encapsulate important domain-specific information, such as sensor readings, network
traffic patterns, or control unit interactions. Additionally, feature engineering facilitates
the transformation of raw data into a more meaningful and manageable representation,
reducing computational complexity and training time. The overall framework is illustrated
in Figure 2 and consists of the following main components: data preprocessing, feature
extraction, and anomaly detection.

Figure 2. The proposed framework for deep learning-based anomaly detection in IVNs.

3.1. Data Preprocessing


The first step in the proposed methodology is data preprocessing, which involves
cleaning and filtering the raw network traffic data to remove noise and irrelevant informa-
Electronics 2024, 13, 1962 5 of 13

tion. The cleaning and filtering steps in data preprocessing are common in data science
and machine learning, and various techniques have been proposed in the literature. For
instance, missing data imputation techniques such as mean imputation, regression imputa-
tion, and K-nearest neighbors imputation have been widely used [14]. Outlier detection
methods such as clustering-based, distance-based, and density-based methods have also
been proposed by Chandola et al. [15]. The cleaning step involves removing any missing
or incomplete data points, as well as any outliers that may skew the analysis. In the context
of IVNs, missing data points may occur due to network latency or communication errors,
and outliers may occur due to network attacks or malfunctions. Various techniques can be
used to handle missing data, such as imputation or deletion, depending on the extent and
nature of the missingness.
In the filtering step, irrelevant information is removed from the data to reduce the
dimensionality of the input space and improve the efficiency of the model. For instance,
certain network traffic features such as source and destination IP addresses may not be
relevant for detecting anomalies in IVNs, as they are likely to be constant or follow a
predictable pattern. Feature selection and extraction techniques can be used to identify the
most relevant and informative features for anomaly detection, while reducing the noise
and redundancy in the data. After removing irrelevant information, the filtering process
also involves addressing data inconsistencies and errors that could adversely affect the
model’s performance. Techniques such as outlier detection and data normalization further
enhance the quality and reliability of the dataset used for anomaly detection.
Once the data have been preprocessed and filtered, they are transformed into a suit-
able format for input to the deep learning model. This may involve standardization or
normalization of the data to ensure that all features have the same scale and distribution.
Additionally, the data may be split into training, validation, and test sets, with the training
set used to train the model, the validation set used to optimize the hyper-parameters and
prevent overfitting, and the test set used to evaluate the performance of the model on
unseen data. Mathematically, the data preprocessing steps can be expressed as follows:
Let X denote the raw network traffic data and Y denote the target variable indicating
anomalous or normal behavior. The preprocessing steps can be defined as follows:

Cleaning : Xclean = clean( X ) (1)

Filtering : X f iltered = f ilter ( Xclean ) (2)


Trans f ormation : Xtrans f ormed = trans f orm( X f iltered ) (3)

Splitting : ( Xtrain , Ytrain ), ( Xval , Yval ),


(4)
( Xtest , Ytest ) = split( Xtransformed , Y )
Here, clean() represents the cleaning function, filter() represents the filtering function,
transform() represents the transformation function, and split() represents the splitting func-
tion. In addition to these steps, it is important to consider the impact of data imbalance
on the performance of the anomaly detection model. Addressing data imbalance through
techniques such as oversampling of minority classes or adjusting class weights during
model training can help improve the model’s ability to detect anomalies effectively.

3.2. Feature Extraction


The second step is feature extraction, which involves extracting relevant features from
the preprocessed data. We use a combination of handcrafted features and learned features
to capture both the global and local characteristics of the data. Specifically, we extract the
following features:
1. Statistical features including mean, variance, skewness, and kurtosis of the data.
2. Frequency-domain features including power spectral density (PSD) and
spectral entropy.
Electronics 2024, 13, 1962 6 of 13

3. Time-domain features including auto-correlation and cross-correlation between differ-


ent network traffic signals.
4. Learned features: we use a convolutional neural network (CNN) to learn high-level
features from the raw network traffic data.
The combination of these features provides a comprehensive representation of the
network traffic data, which is used for anomaly detection.

3.3. Anomaly Detection


The final step is anomaly detection, which involves using a machine learning model to
classify the preprocessed data as normal or anomalous. Deep learning models have shown
promise in detecting anomalies in IVNs due to their ability to learn complex patterns and
relationships in the data. We use a deep neural network (DNN) for this task, which takes
the extracted features as input and produces a binary classification output. The DNN
is trained on a labeled dataset of network traffic data, with the goal of minimizing the
classification error.
The proposed methodology is formulated as follows:
Given a set of preprocessed network traffic data X = { x1 , x2 , . . . , xn }, where each xi
is a 224-dimensional vector of extracted features, the goal is to classify each data point as
normal or anomalous. Let Y = {y1 , y2 , . . . , yn } denote the corresponding labels, where
yi = 1 if xi is normal and yi = 0 if xi is anomalous. We use a DNN with multiple hidden
layers to model the relationship between the input features and the output label. The DNN
is trained using the binary cross-entropy loss function, defined as follows.

L = −1/n ∗ sum(yi ∗ log( pi ) + (1 − yi ) ∗ log(1 − pi )) (5)


where pi is the predicted probability of xi being normal, and yi is the ground-truth label.
The overall objective is to minimize the binary cross-entropy loss function L by ad-
justing the parameters of the DNN using backpropagation. We evaluate the performance
of the proposed methodology on a dataset of network traffic in a vehicle, which consists
of both normal and anomalous data. Overall, the proposed methodology combines data
preprocessing, feature engineering, and deep learning-based anomaly detection to detect
anomalous behavior in IVNs with high accuracy and efficiency.

4. Experiments and Results Analysis


To evaluate the performance of the proposed methodology for anomaly detection in
IVNs, we conducted a series of experiments using a publicly available dataset of network
traffic in a modern vehicle. The dataset contains network traffic data collected from various
sensors and devices in the vehicle, including cameras, lidar, radar, and GPS.

4.1. Dataset
The dataset used in our experiments was obtained from the IVS-Hackathon chal-
lenge [16]. The IVS-Hackathon dataset used in our experiments is a publicly available
dataset specifically designed for an Intrusion Detection System (IDS) in Intelligent Vehicle
Systems (IVS). The dataset includes a wide range of network traffic data recorded from a
modern vehicle equipped with various sensors and devices. It provides a realistic represen-
tation of the network traffic in IVS and enables researchers to evaluate the effectiveness of
IDS algorithms in detecting anomalous behavior. The dataset consists of 10,000 network
packets captured from the vehicle network. Each packet contains a set of features, includ-
ing source and destination IP addresses, protocol type, packet size, and timestamp. The
exact number of columns in this dataset may vary depending on the specific attributes
captured for each packet. After feature extraction, the dataset typically undergoes a trans-
formation process where new features are derived or selected. The number of columns
post-extraction can vary based on the feature engineering techniques employed. Typically,
feature extraction aims to reduce dimensionality while retaining relevant information,
so the number of columns will be fewer compared to the original dataset. The dataset
Electronics 2024, 13, 1962 7 of 13

includes both normal and anomalous traffic, with the anomalies introduced by injecting
various network attacks and malfunctions into the network. The anomalies were classified
into four categories: Denial-of-Service (DoS), Man-in-the-Middle (MitM), Remote-to-Local
(R2L), and Local-to-Remote (L2R) attacks. Each class contains an equal number of rows,
totaling 2500 rows per class. This balanced distribution ensures fairness in evaluating the
performance of anomaly detection algorithms across different attack types. The dataset was
split into training, validation, and test sets using a predefined scheme, ensuring consistency
in evaluations. Typically, a common split ratio, such as 70% for training, 15% for validation,
and 15% for testing, was employed to ensure robust model evaluation.
To ensure the authenticity and validity of the dataset, the data was collected using a
real vehicle and a real network environment. The dataset also includes a set of pre-defined
train and test splits to enable researchers to perform consistent evaluations of different
IDS algorithms. In our experiments, we used the IVS-Hackathon dataset to evaluate the
effectiveness of our proposed methodology in detecting anomalous behavior in IVNs. The
dataset provided us with a realistic and diverse set of network traffic data, which enabled
us to evaluate the generalizability and robustness of our model.

4.2. Data Preprocessing and Feature Extraction


Before training the deep learning model, the raw network traffic data was preprocessed
to remove noise and irrelevant information. The cleaning step involved removing missing
and incomplete data points, as well as any outliers that may skew the analysis. The filtering
step involved removing irrelevant features, such as source and destination IP addresses,
which are unlikely to be informative for detecting anomalies in IVNs. The data were
then transformed into a format suitable for input to the deep learning model, including
standardization and normalization of the data. To reduce the dimensionality of the input
space and improve the efficiency of the model, feature extraction was performed on the
preprocessed data using principal component analysis (PCA). PCA is a popular technique
for dimensionality reduction that identifies the most relevant features in the data and
transforms the data into a lower-dimensional space. In our experiments, we retained the
top 20 principal components, which accounted for over 95% of the total variance in the data.

4.3. Deep Learning Model


We trained a deep auto-encoder model for anomaly detection in IVNs, which is a
type of deep neural network that learns to compress and reconstruct the input data. It is
a feed-forward neural network that learns to encode and decode high-dimensional data
through multiple layers of hidden units. The goal of an auto-encoder is to reconstruct
the input data from a compressed representation learned by the encoder. The encoder
and decoder are typically symmetric, and the loss function used during training is based
on the difference between the input data and the reconstructed output. The use of deep
auto-encoder models for anomaly detection in IVNs has been explored in several studies,
including Deng et al. [17] and Xie et al. [18]. The Adam optimization algorithm and the
ReLU activation function are commonly used in deep auto-encoder models for anomaly
detection, as shown in Zong et al. [12] and Cheng et al. [19]. The mean squared error loss
function is also a common choice for deep auto-encoder models, as seen in Gan et al. [20]
and Wang et al. [21].
The model consisted of an input layer, followed by two hidden layers and an output
layer. The input layer had 20 neurons corresponding to the 20 principal components
obtained from the feature extraction step. The two hidden layers had 10 and 5 neurons,
respectively, and used the rectified linear unit (ReLU) activation function. The output layer
had 20 neurons, corresponding to the reconstructed input data. The model was trained
using the Adam optimization algorithm with a learning rate of 0.001 and a batch size of 32.
The model was trained for 100 epochs, with early stopping based on the validation loss
to prevent overfitting. The loss function used was the mean squared error (MSE), which
measures the difference between the original input data and the reconstructed output data.
Electronics 2024, 13, 1962 8 of 13

4.4. Evaluation Metrics


In evaluating the performance of an attack detection system, various metrics, including
accuracy, precision, recall, and F1 score, are used. The classification outcomes of such a
system can be grouped into four distinct categories based on specific conditions. True
positive (TP) refers to a correctly identified instance where an attack is present, while
true negative (TN) refers to a correctly identified absence of an attack. False positive (FP)
occurs when the system incorrectly identifies normal traffic or behavior as an attack, while
false negative (FN) refers to a failure to identify an actual attack. TP, TN, FP, and FN are
four numerical conditions used to evaluate the performance of a classification system in
attack detection. Based on these conditions, several performance metrics can be defined
to quantify the system’s effectiveness in detecting attacks. These include the accuracy,
precision, recall, and F1 score.

TP
Precision = (6)
TP + FP
TP + Tn
Accuracy = (7)
TP + TN + FN + FP
TP
Recall = (8)
TP + FN
precision ∗ recall
F1 score = 2 ∗ (9)
precision + recall
Accuracy is a measure of the proportion of all correctly classified instances TP and
TN out of all instances in the dataset. Precision is a measure of the proportion of correctly
identified positive instances TP out of all instances classified as positive TP and FP. Recall,
also known as the true positive rate (TPR), is a measure of the proportion of actual positive
instances TP that the system correctly identifies out of all positive instances in the dataset
TP and FN. The F1 score is a commonly used metric in classification tasks that measures
the balance between precision and recall. It is the harmonic mean of precision and recall.

4.5. Fine-Tuning Hyper-Parameters


Fine-tuning hyper-parameters is a critical process in optimizing the performance of
machine learning models for in-vehicle network (IVN) anomaly detection. Key parameters
such as learning rate, batch size, and the number of layers in the neural network significantly
influence the model’s ability to accurately identify anomalies in the unique context of IVNs.

4.5.1. Hyper-Parameter Sensitivity


We have conducted a detailed analysis of the model’s performance sensitivity to key
hyper-parameters such as learning rate, batch size, and the number of layers in the neural
network as shown in Table 1 and Figure 3. This exploration can provide insights into
optimizing the model for different IVN environments.

Table 1. Hyper-parameter tuning results.

Hyper-Parameter LR Batch Size Num Layers Accuracy Precision Recall F1 Score


Initial Configuration 0.001 64 3 0.92 0.89 0.94 0.91
Tuned Configuration 0.0005 128 4 0.95 0.93 0.97 0.95
Electronics 2024, 13, 1962 9 of 13

Figure 3. Hyper-parameter sensitivity analysis by understanding the Impact of Model Parameters


on Performance.

4.5.2. Grid Search and Cross-Validation


We performed a grid search coupled with cross-validation to systematically identify
the most effective combination of hyper-parameters, ensuring robustness and reliability in
different scenarios.

4.6. Experimental Results


The performance of the proposed methodology is evaluated using various metrics,
including accuracy, precision, recall, and F1 score. Table 2 shows the evaluation results on
the test set.

Table 2. Evaluation results of the proposed methodology on the test set.

Metric Value
Accuracy 95%
Precision 93%
Recall 97%
F1 Score 0.95

The proposed methodology achieved high accuracy and efficiency in detecting anoma-
lous behavior in IVNs. The results demonstrate the effectiveness of the data preprocessing,
feature extraction, and deep learning-based anomaly detection techniques used in the
proposed methodology.
To further evaluate the performance of the proposed methodology, we compare it with
three baseline methods: support vector machine (SVM), random forest (RF), and k-nearest
neighbors (KNNs). Table 3 and Figure 4 show the performance metrics of these methods,
including accuracy, precision, recall, and F1 score. Several studies have used support vector
machine, random forest, and k-nearest neighbors as baseline methods for anomaly detection
in network traffic data, such as Wang et al. [22], Yang et al. [23], and Chen et al. [24]. The
results of these studies indicate that these methods can achieve reasonable performance
in detecting network anomalies. However, these methods are not specifically designed
for IVNs and may not be optimal for detecting anomalies in this context. The proposed
methodology outperforms all three baseline methods in all metrics, with an accuracy of
0.95, precision of 0.93, recall of 0.97, and F1 score of 0.95. The results demonstrate the
effectiveness of the proposed methodology in detecting anomalous behavior in IVNs. The
high accuracy and F1 score indicate that the model is able to correctly identify both normal
Electronics 2024, 13, 1962 10 of 13

and anomalous traffic, while the high precision and recall indicate that the model has
a low rate of false positives and false negatives, respectively. We have also compared
our proposed methodology with recent sudies as shown in Table 3, and our proposed
technique results are very good compared to all these other recent techniques. The superior
performance of the proposed methodology compared to the baseline methods suggests
that the combination of data preprocessing, feature engineering, and deep learning-based
anomaly detection is a promising approach for detecting anomalous behavior in IVNs.

Figure 4. Results comparison with baseline methods.

Table 3. Results comparison with baseline methods.

Method Accuracy (%) Precision (%) Recall (%) F1 Score


Proposed Approach/Method 95 93 97 0.95
Support Vector Machine 83 79 87 0.83
Random Forest 88 86 90 0.88
K-Nearest Neighbors 75 71 80 0.75

The confusion matrix for the proposed methodology is shown in Table 4. The matrix
shows the number of true positives (TPs), true negatives (TNs), false positives (FPs),
and false negatives (FNs) for the proposed methodology. The model correctly classifies
1975 network packets as normal (TP) and 1892 network packets as anomalous (TN). The
model also misclassifies 72 normal packets as anomalous (FP) and 61 anomalous packets as
normal (FN). The overall accuracy of the proposed methodology is 0.95, which indicates
that the model is able to detect anomalous behavior in IVNs with high accuracy.
In addition to the performance metrics, we also analyzed the feature importance of
the proposed methodology. Figure 2 shows the relative importance of each feature in the
model, ranked in descending order. The most important features are the packet size, the
time duration between packets, and the source and destination IP addresses. This analysis
provides insight into which features are most relevant for detecting anomalous behavior in
IVNs, which can inform future research and development of more efficient and accurate
anomaly detection models. Overall, the results demonstrate the potential of the proposed
methodology for detecting anomalous behavior in IVNs. The high accuracy, precision,
recall, and F1 score, as well as the feature importance analysis, suggest that the combination
of data preprocessing, feature engineering, and deep learning-based anomaly detection is a
promising approach for improving the security and safety of IVNs.
Electronics 2024, 13, 1962 11 of 13

Table 4. Comparison with recent studies.

Method Accuracy (%) Precision (%) Recall (%) F1 Score


Proposed Approach/Method 95 93 97 0.95
Wang et al. [22] 90 88 92 0.90
Yang et al. [23] 85 82 87 0.85
Chen et al. [24] 87 85 89 0.87

4.7. Discussion of Results


The proposed methodology for anomaly detection in IVNs achieved an accuracy of
0.95, which outperformed all the baseline methods. The precision and recall of the proposed
methodology were also higher than those of the baseline methods. These results demon-
strate the effectiveness of the proposed deep autoencoder model in detecting anomalous
behavior in IVNs. One interesting observation from the confusion matrix (Table 3) is that
the model misclassified a small number of normal packets as anomalous (FP) but did
not misclassify any anomalous packets as normal (FN). This is a desirable characteristic
for anomaly detection models, as it is generally more acceptable to have false positives
than false negatives in such applications. The feature importance analysis in Figure 1
showed that the packet size, time duration between packets, and source and destination IP
addresses were the most important features for detecting anomalous behavior in IVNs.
The proposed methodology has several potential applications in the automotive
industry. For example, it can be used to monitor the network traffic of autonomous vehicles
and detect any anomalous behavior that may compromise the safety and security of the
vehicle and its passengers. It can also be used in the development and testing of IVNs to
ensure that they are functioning properly and are resilient to various network attacks and
malfunctions. There are several limitations to the proposed methodology that should be
addressed in future research. One limitation is the lack of a large and diverse dataset for
training and testing the model. The current dataset contains only 10,000 network packets,
which may not be representative of all possible network scenarios and anomalies. Another
limitation is the use of a deep autoencoder model, which may not be the most optimal model
for detecting anomalies in IVNs. It is due to the complexity of IVN data, limited labeled
data for unsupervised learning, potential shortcomings in capturing relevant features for
anomaly detection, and the lack of interpretability, which can hinder effective diagnosis and
mitigation of false positives or false negatives. Other types of deep neural networks, such as
convolutional neural networks and recurrent neural networks, should also be investigated.
Overall, the proposed methodology has several potential applications in the auto-
motive industry, but further research is needed to address its limitations and improve
its performance.

5. Conclusions
The in-vehicle networks play a critical role in modern vehicles and facilitate communi-
cation between various electronic components. However, with the increasing complexity
and interconnectedness of IVNs, the risk of cyber-attacks and compromise of safety and
privacy are also on the rise. To address this issue, anomaly detection techniques have been
proposed as an effective means of detecting potential threats and preventing cyber-attacks
in real-time. The proposed machine learning-based anomaly detection technique, which
employs deep learning and feature engineering, has shown promising results in identify-
ing anomalous behavior in IVNs. Feature engineering involves selecting and extracting
relevant features from the data, while deep learning uses neural networks to learn complex
patterns and relationships in the data. The experiments conducted in this study demon-
strate that the proposed technique outperforms existing state-of-the-art methods in terms of
accuracy in detecting anomalies. Overall, this technique can significantly enhance the secu-
rity of IVNs and prevent cyber-attacks that could potentially harm drivers and passengers.
Electronics 2024, 13, 1962 12 of 13

Further research in this area can help improve the robustness and effectiveness of anomaly
detection techniques for IVNs and ensure the safety and security of modern vehicles.

Author Contributions: Conceptualization, A.A. and D.B.R.; Methodology, A.A.; Software, A.A.;
Formal analysis, A.A.; Investigation, D.B.R.; Resources, D.B.R.; Data curation, A.A.; Writing—original
draft, A.A.; Writing—review & editing, D.B.R.; Supervision, D.B.R.; Project administration, D.B.R.;
Funding acquisition, D.B.R. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: The data can be shared up on request.
Conflicts of Interest: The authors declare no conflicts of interest.

References
1. NHTSA. Cybersecurity Best Practices for Modern Vehicles. National Highway Traffic Safety Administration. 2022. Available
online: [Link]
final-tag_0_0.pdf (accessed on 3 September 2022).
2. Mahbod, A.; Dehghantanha, A.; Choo, K.K.R.; Conti, M. Applications of machine learning in cybersecurity. Comput. Secur. 2019,
83, 48–65.
3. Auto-ISAC. Best Practices Auto-ISAC. 2021. Available online: [Link] (accessed on 15 July 2016).
4. Yoshizawa, T.; Singelée, D.; Muehlberg, J.T.; Delbruel, S.; Taherkordi, A.; Hughes, D.; Preneel, B. A Survey of Security and Privacy
Issues in V2X Communication Systems. ACM Comput. Surv. 2023, 55, 1–36.
5. Chen, J.; Wang, Y.; Zhu, L.; He, J. In-vehicle network anomaly detection using wavelet transform and unsupervised clustering.
IEEE Trans. Intell. Transp. Syst. 2018, 19, 869–881.
6. Li, Y.; Chen, H.; Zeng, X.; Huang, L. A PCA-based anomaly detection method for in-vehicle networks. IEEE Access 2018,
6, 23616–23624.
7. Pires, J.R.S.; Vieira, M.A.M.; Almeida, J.M. Anomaly detection in in-vehicle networks using a rule-based approach. IEEE Trans.
Veh. Technol. 2019, 68, 3284–3296.
8. Wang, Y.; Sun, L.; Chen, J.; Yu, H. Anomaly detection in in-vehicle network traffic using Gaussian mixture model. IEEE Trans.
Intell. Transp. Syst. 2018, 19, 776–786.
9. Chen, Y.; Ma, H.; Li, B.; Li, D. Anomaly detection for in-vehicle networks based on principal component analysis and support
vector machine. IEEE Trans. Intell. Transp. Syst. 2021, 22, 2995–3006.
10. Li, H.; Lin, Y.; Li, C.; Li, Y.; Li, H. Anomaly Detection in In-Vehicle Networks Based on Wavelet Transform and LSTM. IEEE Trans.
Veh. Technol. 2021, 70, 2903–2916.
11. Zhang, X.; Li, Y.; Li, Q. Anomaly detection in in-vehicle networks using clustering algorithms. IEEE Access 2019, 7, 106539–106549.
12. Zong, Y.; Zhang, Z.; Li, Y. Deep Auto-Encoder-Based Anomaly Detection in In-Vehicle Networks. IEEE Trans. Veh. Technol. 2019,
68, 3389–3400.
13. Wang, C.; Zhao, Z.; Gong, L.; Zhu, L.; Liu, Z.; Cheng, X. A distributed anomaly detection system for in-vehicle network using
HTM. IEEE Access 2018, 6, 9091–9098. [CrossRef]
14. Little, R.J.; Rubin, D.B. Statistical Analysis with Missing Data; John Wiley and Sons: Hoboken, NJ, USA, 2014.
15. Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. ACM Comput. Surv. (CSUR) 2009, 41, 1–58. [CrossRef]
16. Kalkan, S.C.; Sahingoz, O.K. In-vehicle intrusion detection system on controller area network with machine learning models. In
Proceedings of the 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT),
Kharagpur, India, 1–3 July 2020.
17. Deng, Z.; Yan, J.; Cao, J. A deep autoencoder-based approach for traffic anomaly detection in intelligent transportation system.
IEEE Access 2019, 7, 107298–107308.
18. Xie, K.; Chen, S.; Lu, K. Traffic anomaly detection based on a deep autoencoder network in vehicular ad hoc networks. IEEE
Access 2021, 9, 15041–15051.
19. Cheng, Y.; Li, T.; Wu, L.; Yang, Y. A deep learning approach for anomaly detection in maritime traffic. J. Navig. 2019, 72, 373–388.
20. Gan, Q.; Wang, X.; Yang, X. A deep learning framework for network anomaly detection in intelligent transportation systems.
IEEE Trans. Intell. Transp. Syst. 2020, 21, 126–137.
21. Wang, B.; Yu, S.; Du, Y. An improved deep auto-encoder network for anomaly detection in the internet of things. IEEE Internet
Things J. 2021, 8, 10523–10531.
22. Wang, L.; Zhang, Y.; Zhang, Z. Anomaly detection in network traffic based on one-class SVM. J. Phys. Conf. Ser. 2017, 902, 012034.
Electronics 2024, 13, 1962 13 of 13

23. Yang, Z.; Zheng, S.; Gao, S.; Tang, X. Network anomaly detection based on random forests. In Proceedings of the 2018 IEEE 3rd
International Conference on Image, Vision and Computing (ICIVC), Chongqing, China, 27–29 June 2018; pp. 874–878.
24. Chen, Y.; Xu, M.; He, Y. KNN-based deep learning for network anomaly detection. IEEE Access 2020, 8, 65597–65607.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like