Smartphone Image-Based Road Damage Detection in Low-Light Conditions
Smartphone Image-Based Road Damage Detection in Low-Light Conditions
Abstract—Road damage occurs due to various factors and impacting the quality of life of people living in affected
can cause negative impacts such as increased accidents, vehicle areas. Given these significant effects of road damage, there
operating costs, decreased transportation efficiency, and reduced is a growing need to study and understand the causes and
access to critical services. In this work, a novel dataset with
a total of one thousand one hundred seventy samples was implications of road damage. Research in this area can help
developed using a smartphone camera in order to create an object to identify effective strategies and solutions for maintaining
detection model for detecting road damage such as potholes, and improving road infrastructure and mitigating the negative
unpaved and normal in low light conditions. Image preprocessing impact of road damage on the economy and society.
techniques were used to enhance the low-light images. The dataset The challenge of detecting road damage early on-site is
has been categorized into before and after enhancement for
comparison with results obtained from both methods. YOLOv7 a significant issue in current research. Early detection can
object detection model is more capable of handling the challenges enable proactive measures for constructors and improve road
present in on-site images of road damage at low light conditions. longevity. This requires collaboration between disciplines like
The proposed method has demonstrated high performance in civil engineering, transportation engineering, and materials sci-
detecting multiple damages on roads. The results indicated a ence. Advancing road damage detection and maintenance will
mean average precision (mAP) of 0.672 before enhancement and
results with mAP of 0.764, with an IoU threshold of 50% after lead to effective intervention strategies and sustainable infras-
enhancement. tructure management. Further research in this area promotes
Keywords—Road damages, Potholes, Unpaved, Image prepro- interdisciplinary collaboration and innovation in engineering
cessing, Deep Learning, Object Detection. and materials science.
This research aims to create a deep-learning model that can
I. I NTRODUCTION effectively detect road damage using on-site images, and the
Road damage is a pervasive issue affecting both developed study offers a substantial contribution through the utilization
and developing countries across the world. It refers to the of a dataset comprising thousand hundred and seventy infield
deterioration of roads caused by natural wear and tear, as images captured during night-time using a motorbike at speeds
well as environmental factors and human activities. Road ranging from 15 to 25 kilometers per hour. These images were
damages can include potholes, cracks, ruts, and other forms of acquired under diverse road conditions, including challenging
pavement distress, which can impair the quality of roads and night-time environments. The inherent difficulty of the night-
their functionality. The effect of road damage in the real world time images, characterized by severe darkness, motion blur,
can be significant, both economically and socially. In terms of and high levels of noise, posed significant challenges for the
economic impact, road damage can lead to higher vehicle oper- analysis. However, the model trained on this data demonstrated
ating costs, reduced fuel efficiency, and increased maintenance promising results, prompting the adoption of preprocessing
costs for both vehicles and road infrastructure. Furthermore, techniques. Techniques and retraining improved the model
damaged roads can negatively impact trade and commerce, as significantly. The YOLOv7 object detection model demon-
well as transportation and logistics, which can have knock-on strated robust performance in detecting damages on normal
effects on supply chains and the wider economy. Socially, road and unpaved roads. The evaluation included comparing before-
damage can impact the safety and well-being of individuals enhanced and after-enhanced images, highlighting the effec-
and communities. Poor road conditions increase the risk of tiveness of the proposed approach. These findings emphasize
accidents, injuries, and psychological trauma, compromising the promising prospects of utilizing deep learning models in
people’s well-being. [19]. Moreover, road damage can also road damage detection and analysis focusing on detecting
reduce access to essential services and social amenities, such damages using computer vision in autonomous vehicles [10],
as healthcare, education, and emergency services, thereby which can significantly improve road infrastructure mainte-
nance and management.
∗ corresponding author.
The duration of research commenced three months ago,
prompted by an extensive literature review encompassing a
979-8-3503-0082-6/23/$31.00 ©2023 IEEE
wide range of related papers. Existing studies predominantly A focus on low-light conditions and data enhancement tech-
concentrated on road damage detection during daylight hours, niques contributes to advancements in road damage detection
leaving a notable gap in knowledge regarding identifying road in challenging lighting scenarios. Experiments revealed several
damage under low light conditions, a critical period char- studies in the literature that focus on road damage detection
acterized by heightened driving risks. Integrating computer and class recognition. These works primarily center around
vision, machine learning, and deep learning methodologies identifying and detecting various types of road damage.
on road damage identification emerged as a recurring theme Heo et al. [1] The study proposed dataset of 665 road
in the literature. Due to limited research on low-light con- damage images with annotations at Kaggle. Data augmentation
ditions, further investigation is required. To bridge this gap, techniques were applied, resulting in 2,665 augmented images.
the study focuses on assessing the efficacy of current object Data augmentation techniques, including gamma adjustment,
identification algorithms in low-light scenarios. A crucial horizontal flip, and scaling, are employed to preprocess the
step in research involved sourcing open datasets; however, dataset, compensating for limited data and improving adapt-
discovering a significant dearth of available low-light road ability to varying lighting conditions and image qualities. A
damage images, with most existing datasets centered around single-class pothole is detected and recognized by a compari-
daylight conditions. This scarcity further underscores the need son study of YOLOv2: 83% precision, 74% recall, 78.2% F1-
for research to delve into the challenges and opportunities score, 26 FPS. YOLOv3: 88% precision, 72% recall, 79.2%
presented by low-light conditions in road damage detection. F1-score. YOLOv4 tiny: 74% precision, 76% recall, 72.5%
The research scope of this paper includes collecting data F1-score. Proposed SPFPN YOLOv4 tiny: 89% precision, 84%
for object detection on road damages posing challenges in recall, 86.4% F1-score. Bučko et al. [2] Worked on the existing
nighttime conditions. Daylight conditions involve issues like dataset on Various pothole detection datasets that are available
shadows, glare, and varying contrast. Nighttime conditions for research and development. These include MakeML 665
introduce low lighting, noise, limited illumination range, and images, MIIA Pothole Dataset 2,459 images, Road Damage
additional challenges related to shadows caused by vehicle Dataset 9,053 images, Road Surface Damages 18,345 images,
reflections. Strategies such as image preprocessing help ad- Pothole Detection Dataset 1,243 images, RDD2020 26,336
dress these challenges for accurate object detection in different images, and RDD2022 38,385 images, techniques such as
lighting conditions. image resizing, normalization, data augmentation, label encod-
The paper follows a structured organization, beginning ing, and noise removal may have been applied to prepare the
with an introduction that outlines the problem statement and images. The detection accuracy of the Yolo v3 model achieved
emphasizes its importance. It sets the stage for the subsequent an mAP of 0.771. For rainy conditions, achieved 0.505, while
sections. The related work section conducts a thorough exami- for sunset conditions, they were 0.529. For night conditions,
nation of existing literature, drawing insights and incorporating the mAP was 0.175, and recognition for the detection of
relevant information into the proposed approach. It builds upon potholes and manhole covers. Pham et al. [4] worked on
the findings from previous studies, highlighting their influence dataset that includes training with two test sets, containing
on the current research. The proposed system section pro- a total of 21,041 images and 34,702 real-world labels for box
vides a comprehensive overview of the research components, boundaries and damage categories. The pre-processing phase
including detailed explanations of the dataset, the chosen involved several techniques. Firstly, the images were scaled
architecture, and the methodologies employed. It offers a clear down to a standardized length of 640x640 pixels. Secondly,
understanding of the methodology and approach adopted in the the annotations were cleaned, removing irrelevant or invalid
study. Four subcategories are explored: dataset, pre-processing, annotations. The detection results of the proposed model
experimental setup, and object detection/recognition models. highest accuracy’s were achieved in the United States and
The experimental results section presents the outcomes, focus- Japan boards, with F1 scores of 0.817 and 0.735, respectively.
ing on object detection and recognition. Finally, the conclusion India and Norway had comparable F1 scores, both around
section underlines their significance. The paper concludes 0.504 by recognizing the damaged longitudinal cracks, traverse
with a comprehensive list of references that have been cited cracks, alligator cracks, and potholes. Asad et al. [5] state
throughout the study. that each image in the dataset is multiple potholes resulting
in approximately 8,000 potholes in the entire dataset. The
II. R ELATED W ORKS detection of potholes was carried out using an AI kit (OAK-
The timely detection and accurate identification of potholes D) connected to a Raspberry Pi single-board computer, which
on roads are vital for public safety and infrastructure main- served as the edge platform. Tiny-YOLOv4, YOLOv4, and
tenance. However, detecting potholes in low-light conditions, YOLOv5 models achieved high mean average precision (mAP)
especially at night, is challenging due to limited visibility and scores of 80.04%, 85.48%, and 95%, for pothole detection.
poor image quality. Existing literature and datasets for low- Yang et al. [6] dataset consists of 1994 training samples
light road damage are scarce. This research addresses this and 2013 validation samples, encompassing transverse cracks,
gap by evaluating object identification algorithms for low- longitudinal cracks, and map cracks. preprocessing methods
light road damage images. Proposed pre-processing techniques used guided filtering, Retinex. Improving the crack image’s
to enhance low-light images, resulting in improved detection. quality by reducing noise with enhancing brightness. The
proposed method using YOLO-SAMT achieves the highest extraction, and calibration, improved the accuracy. The system
accuracy of 89.43% and transverse cracks, longitudinal cracks, successfully achieved the task of detecting and categorizing
and map cracks. On proposed algorithm achieves a high detec- various object road anomalies such as potholes, manholes,
tion accuracy of 96.67% outperforms other traditional image turning corners due to recorded accelerometer traces during
extraction algorithms in crack recognition. Khaled R. Ahmend driving sessions. Rateke et al. [17] comprising road images
[11] The dataset used in the study comprises a collection with diverse surface types. It contains over 77,000 frames
of 665 images in total with 2139 pothole annotations. The captured using a low-clarity camera mounted on a vehicle
preprocessing steps involved in this work include applying data and dataset collected and used some preprocessing techniques
augmentation techniques used in the study including scaling, may include Region of Interest (ROI) from each data and
color adjustments, rotation, and mosaic augmentation. These applying data augmentation to simulate brightness variations.
techniques enhance the effectiveness of the deep learning mod- detecting road surface types, with an accuracy of 94.57 with
els for spotting potholes. Image resizing was performed for the road surface quality recognition also utilizes three classes
YOLOv5. The detection and recognition among the YOLOv5 for asphalt and paved surfaces: Good, Regular, and Bad. Akula
Yl, model achieved an accuracy of 58.9%, For the Faster R- et al. [18] worked on collecting a dataset comprises of both
CNN with MVGG16 backbone achieved an mAP of 44.6%. thermal images of roads with potholes and thermal images
chitale et al. [12] The dataset used in the paper consists of of roads without potholes.. used preprocessing techniques on
approximately 1300 images of roads, including waterlogged thermal images, such as cropping, resizing, and augmentation,
and dry potholes. some preprocessing steps have been taken as to improve recognition using convolutional neural networks.
image resizing, normalization, data augmentation, and object achieved a detection accuracy of 97.08% using convolutional
localization. the performance of YOLOv3 accuracy of 0.889 neural networks with thermal images recognizing the class and
and YOLOv4 with 0.933 for pothole detection indicating detecting potholes. Maeda et al. [20] The research used dataset
more accurate recognition bounding box predictions and better consists of 9,053 road damage images that were captured using
pothole detection results compared to YOLOv3. Kumar et a smartphone with data augmentation techniques applied for
al. [13] the pothole dataset consisting of 1500 images, A increased diversity. The dataset has been made publicly avail-
labeled dataset was prepared with pothole coordinates. Image able while addressing privacy concerns by blurring sensitive
cropping and TfRecords were used for fine-tuning Inception information. The model achieved recall and precision values
v2. The system detected potholes effectively in videos and of over 71% and 77% for various road damage classes, such
images, with potential for improvement. A pothole detection as linear cracks, alligator cracks, rutting, and blur. The model
system using Faster R-CNN was proposed, utilizing foreign had a smartphone with a 1.5-second inference time. Lee et
road data. However, R-CNN models have longer prediction al. [21] Here system was tested on highway video captured
times. The use of foreign data may not be suitable for using a smartphone camera. The video had a resolution of
Indian roads due to differing conditions. Hegde et al. [14] 1920 x 1080, 30 fps, and lasted around 1 minute and 30
dataset contains a total of 21,041 images, with each image seconds with 38,137 frames. Using preprocessing techniques,
annotated to identify one or more classes of road damage. Data including grayscale conversion, and superpixel clustering for
was augmented using various image processing techniques segmentation analysis. These techniques enhance accuracy by
sharpening, multiplying, and adding the use of affine texture identifying dark, round, and rugged pothole characteristics.
mapping and Gaussian noise. and hyperparameters get fine- Presents an automatic system for detecting potholes based
tuned during training. The experimental results u-YOLO with on image processing techniques. using a camera sensor for
EM+EP demonstrated an F1 score of up to 0.67, the approach detection. It achieved a 93.06% accuracy in detecting and rec-
showed results in detecting and recognizing types of road ognizing potholes. Alfarrarjeh et al. [22] include the collection
damage. Ukhwah et al. [15] The dataset comprises highway of a dataset of 9047 samples, data augmentation, and cropping
images using Hawkeye 2000 survey vehicle equipped with a as preprocessing techniques to enhance the training dataset for
Pavement View camera containing a total of 2,342 images. The detection. Road damages can be detected and classification
process converts videos to images using Hawkeye processing solutions upon deep learning methods approach received a
toolkit software. Pothole images are selected and annotated maximum F1 score of 0.62 by recognizing includes eight
with bounding boxes, including class IDs and coordinates. types of road damage, including cracks and other corruptions
The study achieved satisfactory results in pothole detection such as potholes, all other types of cracks. Vigneshwar et
and recognition using the YOLO neural network. The typical al. [23] primarily focus on the image processing techniques
mAP values for YOLO v3, YOLO v3 Tiny, and YOLO v3 and segmentation methods employed for pothole detection.
SPP were 83.43%, 79.33%, and 88.93% respectively. E. N. Implemented image pre-processing methods such as resizing,
Sharma et al. [16] This study collected data by driving through grayscale conversion, median filtering, and difference of Gaus-
various areas in the NCR region and recording accelerometer sian filtering. Compared different segmentation techniques
traces. The dataset included different road conditions and including detecting boundaries, thresholding, K-Means clus-
densities of potholes in specific locations such as Noida tering and Fuzzy C-Means clustering techniques are applied
Sector 122 and Noida-Greater Noida Expressway. Prepro- in the study for the purpose of pothole detection. Thresholding
cessing techniques, including filtering, segregation, feature achieved 80.6% accuracy. Edge detection achieved 90.2%
accuracy. K-Means clustering achieved 82.5% accuracy. Fuzzy keyframe for 60 seconds, The data collection took place during
C-Means clustering achieved 82.5% accuracy. These results nighttime between 10:00 PM and 2:00 AM over the course
help in selecting the most suitable technique for pothole of a week. The data was captured while traveling at speeds
identification based on accuracy, sensitivity, specificity, and ranging from 15 to 25 kilometers per hour. The captured video
computation time. Jo et al. [24] The paper focuses on the segments were of long duration, resulting in the extraction
development of a technique for spotting potholes that combine of a substantial number of keyframes. In total, more than
a black-box camera and introduces a detection algorithm. 80,000 images were extracted. To identify the exact damaged
Worked on preprocessing techniques used including cropping road sections, a meticulous process of manually examining
the image, converting it to grayscale, and Otsu’s thresholding individual images was conducted. From this extensive collec-
for segmenting pothole areas that are in the background and tion, a subset of 1170 images was selected as representative
similar objects. The algorithm’s sensitivity (true-positive rate) samples for further analysis. Due to challenging conditions
is 71% and precision (positive predictive value) is 88%. This such as complete darkness, poor image clarity, and excessive
means that as a percentage of all actual potholes, 71% were noise, some images depicting damage may not have met
accurately identified, while compared to all of the detected the required quality standards and were excluded from the
potholes, 88% were accurate. Huidrom et al. [25] The dataset final selection. The images were captured in unpredictable
used for testing the algorithm consists of a total distressing environmental conditions, posing difficulties in terms of light-
1275 video frames. A total of data used in random selection ing and focus. The research paper primarily investigates the
of video frames showing distress was performed from the robustness and achievements of the YOLOv7 model. [7].
results obtained through the application of the DFS algorithm YOLO is a one-stage detector that directly proposes generating
to different video clips. Image enhancement improves image bounding boxes without the need for input photos of an object
quality, while segmentation divides the images into meaningful region proposal step. The paper discusses the performance
regions. The proposed algorithm achieves high accuracy in of YOLOv5, YOLOv6, and YOLOv7 in terms of accuracy,
detecting and recognizing frames with potholes and cracks, inference speed, and GPU utilization. Previous versions of
with 97% overall accuracy, 95% precision, and 81% recall YOLO, like YOLOv3, have been reported to have lower recall
for potholes, and 94% overall accuracy, 93% precision, and and more localization errors compared to methods like Faster
98% recall for cracks. However, it has lower performance R-CNN, which struggles with detecting small objects and
in detecting frames with patches, with 90% overall accuracy, groups of closely located objects due to its limited bounding
8.5% precision, and 19% recall. box proposals per grid. YOLOv5 and YOLOv6 are two distinct
Existing literature on road damage detection and pothole versions, each with its own strengths. While YOLOv5 offers
identification utilizes deep-learning algorithms and datasets. greater stability and customization options using YAML, MT-
Preprocessing techniques enhance model performance, and YOLOv6 excels in detecting small objects in densely packed
notable models include YOLOv2, YOLOv3, YOLOv4, Tiny- environments. The research paper presents these findings,
YOLOv4, YOLOv5, Faster R-CNN, and Inception v2. Studies highlighting the evolution and improvements made in each
use datasets of various sizes, with annotations for different YOLO version, along with a comparative analysis of their
road damage types. Data augmentation improves dataset di- performance against other object detection approaches.
versity for varied lighting conditions and image qualities. The present research focuses on the model trained on
Performance evaluation of the models utilizes criteria such as carefully curated datasets about road detection, with a par-
accuracy, recall, F1-score, and mean average accuracy (mAP). ticular emphasis on identifying damages such as potholes and
Results vary, with F1-scores ranging from 70% to over 90%. unpaved areas. The training process involves annotating the
Pothole detection studies achieve high mean average precision actual damage types in on-site images collected from diverse
scores of 80% to 95%. The based study addresses the novel regions, and these annotated datasets serve as the foundation
challenge of detecting road damage and identifying potholes for the model’s learning process. The methodology encom-
and damages in low-light conditions, which has not been passes dataset collection, system configuration for the deep
extensively explored in prior research. We have thoroughly learning model, and training, testing, and validation stages.
reviewed existing papers on this topic and found that no To assess the effectiveness of YOLOv7, a comparison is con-
previous studies have specifically focused on object detection ducted between its performance before and after enhancement.
of road damage in nighttime conditions. As a result, we The model undergoes training and evaluation on validation
propose a novel approach to tackle this problem and achieve and testing datasets, enabling a comprehensive assessment of
significant results, despite encountering numerous challenges. its capabilities. YOLOv7 ex-cells in real-time road damage
detection, surpassing previous versions and setting a new
III. P ROPOSED M ETHOD benchmark. Its accuracy, speed, and stability make it the
A system for the detection and identification of road dam- preferred choice for this research, advancing the field of road
age, specifically focusing on potholes, in low-light conditions. damage detection of the proposed model, as shown in Fig. 1.
The datasets were recorded utilizing a smartphone camera The Architecture of the Proposal Model involves a series of
with two-wheeler motor vehicle with a mobile stand, captur- steps to facilitate effective object detection and recognition.
ing videos at resolutions of 720p and converting into each The process begins with image acquisition, where data is
Fig. 1. Architecture of Proposal Model
collected and extracted into frames. Preprocessing techniques classes are pothole, unpaved, and normal samples are listed in
such as increasing brightness and contrast levels, reducing Table I.
noise, and sharpening the extracted images are applied to
enhance their quality. Following this, classes are labeled using TABLE I
C LASS - WISE STATISTICS OF ANNOTATIONS
annotations and bounding boxes to provide clear identification.
After that, the dataset is divided into training, testing, and Class Label Train Label Test Label Validation Label Total
validation sets, employing a shuffle technique to ensure ran-
Normal 403 40 59 502
domness. The dataset has three subgroups: training, testing, Unpaved 657 88 81 826
and validation. An 80:20 split is used, with 70% of the Potholes 386 47 62 495
data going towards training and the remaining 30% going to- Total 1446 168 202 1816
wards analysis divided equally between the test and validation
sets. Subsequently, The prepared dataset is used to train the The table provides class-wise statistics of annotations for the
YOLOv7 model. A model’s performance has evaluated based dataset used in the study. The dataset consists of three classes:
on its ability to detect and recognize objects accurately. The Normal, Unpaved, and Potholes. In the training set, there are
results obtained from this process contribute to the assessment 403 annotations regarding the Normal class, 657 annotations
and improvement of the proposed architecture, enhancing its regarding the Unpaved class, and then 386 annotations on the
potential for object detection and recognition tasks. Potholes class. The test set contains 40 annotations regarding
the Normal class, 88 annotations concerning the Unpaved
A. Dataset
class, and 47 annotations regarding the Potholes class. The
The dataset in this research comprises a diverse collection validation set consists of 59 annotations in the Normal class,
of images capturing instances of road damage observed in 81 annotations in the Unpaved class, and 62 annotations in
Mysuru, located in Karnataka, India. The data acquisition pro- the Potholes class. The total number of annotations across all
cess involved using a smartphone camera mounted on a two- classes is 1446 for training sets, 168 for test sets, and 202 for
wheeler vehicle, allowing for the capture of high-definition validation sets. Overall, dataset comprises 1816 annotations in
videos at a resolution of 720p. The images were obtained total.
under challenging environmental conditions characterized by As part of the preprocessing phase, a crucial step involved
unpredictable lighting and focus issues. To facilitate compre- resizing the images to a standardized resolution of 640 x 640
hensive analysis and evaluation, datasets were meticulously pixels. This resizing process ensured uniformity and facilitated
separated into distinct subsets for training, testing, and valida- efficient processing of the dataset. The resized images were
tion purposes, ensuring representative samples were available then subjected to manual annotation using YoloLabel v1.1.1,
for each study stage. Class labels for annotation are presented a popular annotation tool in the field of object detection.
in Figure 1, and statistics of annotated samples related to the During the annotation process, trained annotators carefully
examined each image, focusing specifically on road sections unpaved and muddy roads, unwanted miscellaneous objects
where meaningful conclusions regarding damages could be de- as shown in Fig. 3(a), on the roads and vehicles, as well
rived. To accurately capture the spatial extent of the damaged as varying backgrounds. Additionally, the images were taken
regions, bounding boxes were meticulously placed around from varying distances.
the relevant areas, and their corresponding coordinates were
recorded. These annotations, saved in a YOLO-compatible for-
mat (.txt), provided the necessary ground truth for training and
evaluating the YOLOv7 model. This meticulous annotation
process forms a critical foundation for achieving accurate and
precise object detection in road damage detection applications.
The dataset used in this study plays a pivotal role in training,
validating, and evaluating the YOLOv7 model’s performance
specifically designed for road damage detection. With metic-
ulous curation and extensive annotation, the dataset encom- Fig. 3. (a) miscellaneous objects on the road & (b) motion blur along with
passes a wide range of road damage characteristics observed in shadows & (c) differing lighting conditions.
various real-world conditions, including cracks, potholes, and
pavement deterioration. This comprehensive representation of B. Preprocessing
road damage instances enhances the model’s robustness and The preprocessing step in computer vision applications
ensures its relevance in accurately detecting and classifying prepares raw input data for further analysis. In the context
diverse types of road damage. By incorporating such diverse of Low Light Road Damage Detection using YOLOv7, To
instances, the dataset enables the study to draw meaningful enhance the image at low-light time conditions for damage
conclusions, derive valuable insights, and contribute to the detection, an image preprocessing methodology is developed
advancement of road infrastructure maintenance and safety. to increase sharpness and decrease noise.
The availability of a rich and diverse dataset further enables 1) Adjusting Contrast and Brightness:
researchers and practitioners to assess and contrast the ef- This step involves several operations, including increasing
fectiveness of different detection algorithms and techniques, contrast and brightness, sharpening, and noise reduction using
fostering advancements in the area of identifying road damage Gaussian Power Law Transformation to increase the contrast
and management. With its significance and potential impact, and brightness of the images, the alpha and beta values are
the dataset serves as a valuable resource for future research adjusted using a linear transformation given in (1). The output
and development efforts aimed at improving road infrastruc- image is obtained by multiplying each alpha value applied to
ture sustainability and public safety. Samples of original and each input picture pixel before the beta values are added.
annotated samples are presented in Fig. 2.
Iout (x, y) = α · I(x, y) + β (1)
Equation (1) represents an image enhancement technique,
where I out(x,y) is the generated image, I(x,y) is the source
image, alpha, and beta are variables that regulate the enhance-
ment process. This equation allows for the adjustment of image
brightness and contrast to improve visual quality.
2) Unsharp Masking for Image Sharpening:
Next, unsharp masking is applied to increase the sharpness of
the images. This is achieved by deducting the unclear depiction
of the original picture from the input image and adding results
to the original image as given in (2). The blurred image is
obtained by convolving the source picture using a Gaussian
kernel [8]. This operation enhances the edges of the image’s
high-frequency details while suppressing the low-frequency
noise.
sesses the quality of predicted bounding boxes by measuring multiple object classes. These metrics offer valuable insights
their overlap with ground truth objects. Intersection over Union for analyzing and comparing detection models.
(IoU), establishes the similarity between the ground truth and
anticipated boxes. Precision and recall measure the accuracy 1 X
and completeness of object detection. Then F1 score provides
APc = Pr (4)
n
some balanced evaluation of the performance of precision and r∈{0,...,1}
recall metrics. True positive, false positive, and false negative Equation (4) This formula assesses the accuracy of detecting
counts help calculate these metrics. These evaluation metrics objects within that class. The average precision is determined
offer insights into the performance of YOLOv7, aiding in using the sum values with precision at different recall levels
comparisons and improvements of the model. The calculation (ranging from 0 to 1) and dividing it by the entire sum of
of performance metrics is done according to equations (4) - samples (n) belonging to the class. Precision and recall are
(6). metrics used to evaluate object detection models, measuring
1) Average Precision (AP): accuracy and completeness. Average precision combines these
The Average Precision and mAP are applied in metrics during metrics to assess the model’s performance in identifying
the detection of objects that assess the precision of the specific objects.
bounding box predictions by measuring their overlap with 2) Mean Average Precision:
ground truth objects. They comprehensively evaluate detection Popular metric Mean Average Precision (mAP) in object
quality, considering precision, recall, and trade-offs across detection measures the accuracy of bounding box detection.
It assesses the degree to which predictions and reality overlap evaluates how accurately detected objects, while recall as-
objects. AP is defined for each class, and mAP represents the sesses the model’s ability to detect all relevant objects. AP
average AP across all classes. It offers a holistic evaluation calculates the average precision across different object classes,
of the object detection capability of the model across various providing an overall performance assessment. The overlap
categories. Higher mAP values indicate superior accuracy and between ground truth and expected bounding boxes is mea-
effectiveness in object detection. sured by IoU. These metrics collectively gauge the accuracy,
completeness, and robustness of object recognition models,
1X aiding in their evaluation and comparison. The calculation of
mAP = APc (5)
C c∈C performance metrics is done according to equations (7) - (10).
1) Intersection over Union:
The calculation is done by summing the average precision In the evaluation process, the model’s predicted bounding
(AP) values for one individual and dividing the total by the box is truly positive, then it has a minimum overlap of
class by the total number of classes considered. Formula iter- 50% with the real-world bounding box. The intersection over
ates over all classes using the summation symbol (), ensuring union (IoU) method is used to calculate this overlap metric,
that the calculation is performed for each specific class. The which gauges agreement between both boxes. IoU metric
average precision (AP c) represents the accuracy of object is instrumental in assessing the model’s object identification
detection for a particular class. By computing the mAP, a accuracy and provides valuable insights into its performance
comprehensive evaluation of the system’s performance across and reliability, which is computed using the formula given in
all classes is obtained. The metric enables comparing and (7).
ranking object detection models based on their accuracy in
Area of Intersection
detecting objects across categories, providing a single value IoU = (7)
for average precision across multiple classes, indicating the Area of Union
model’s effectiveness in diverse scenarios. The IoU formula calculates the Intersection over Union,
a widely used metric in object detection. It measures the
To evaluate the performance of an object detection model,
overlap between predicted and ground truth bounding boxes.
the Average Precision (AP) metric is utilized. It creates a
The following numerator represents the intersection area, and
precision-recall curve and calculates the vicinity of the curve
the denominator represents the union area. IoU ranges from 0
to get the dataset’s average precision for each class, sum of
to 1, where higher values indicate better alignment. It serves as
the precision values at each of the 11 recall values between 0
a threshold to determine accurate detections, with a predefined
and 1 represents the sum over all classes i in the datasets.
threshold for successful matches.
3) Precision:
The formulas for the Precision, Recall, and F1-score used
The precision for class i at a given recall level is the proportion
the following:
of accurately predicted positive cases to all of the projected
2) Precision:
excellent instances for that class. It indicates how accuracy
of the model’s predictions for class i at that particular recall
value and is calculated using the following formula in (6). TP
Precision = (8)
TP + FP
T Pi (r) The equation represents the calculation of precision, which
Pi (r) = (6)
T Pi (r) + F Pi (r) measures the accuracy of object detection. That’s computed
using a division of the true positives (TP) detections by the
The equation represented as Pi(r) calculates the precision
total of false positives (FP) and positive test results detections.
for a specific class i at a given recall value r. The proportion
Precision indicates the percentage of properly selects identified
of true positives is how precision is measured (TP i(r)), which
things among all the predicted objects, with a higher value
refers to the correct predictions of instances belonging to
indicating greater accuracy.
class i, added along with false positives are genuine positives 3) Recall:
(FP i(r)), representing both correct and incorrect predictions
of class i. This equation provides a measure of the accuracy of
predictions for class i at the given recall value. A higher preci- TP
Recall = (9)
sion value indicates a lower rate of false positive predictions, TP + FN
indicating a more accurate and reliable detection performance The equation displays the calculation by the recall, which
for class i. measures the completeness of object detection. The computa-
tion of the metric involves dividing the number of correctly
F. Object Recognition Metrices identified positive detections by the sum of true positive
Object recognition metrics are employed to assess the (TP) and false negative (FN) detections. Recall quantifies
performance worked on object models for detection like a proportion of correctly identified objects among all the
YOLOv7. These metrics include precision, recall, average ground truth objects, with a higher value indicating greater
precision (AP), and Union over intersection (IoU). Precision completeness in the detection results.
4) F1-score: accuracy and suggest the potential for improved model perfor-
mance through data enhancement strategies as shown in table
III and IV.
Precision × Recall
F1-score = 2 × (10)
Precision + Recall TABLE III
T HE EFFICIENCY OF THE YOLOV 7 BEFORE ENHANCEMENT OF THE
The F1-score is a metric that combines precision and SAMPLE TEST DATA IS BEING EVALUATED ON A CLASS - WISE BASIS .
recall into a single value. It is calculated using the harmonic
mean of these two measures, providing a balanced assessment Class Label Precision Recall F1-score [email protected]
of model performance. By considering false positives and Normal 0.784 0.746 0.764 0.841
false negatives, the F1-score represents the trade-off between Unpaved 0.736 0.691 0.712 0.709
Potholes 0.628 0.435 0.513 0.467
precision and recall, indicating the accuracy and completeness Average 0.716 0.624 0.663 0.672
of object detection results. A higher F1-score signifies a better
balance between these aspects.
Table III presents the assessment by the YOLOv7 model’s
IV. R ESULTS AND D ISCUSSION performance achieved on the test dataset before any en-
The suggested model is assessed utilizing a comprehensive hancements. The outcomes are reported on a class-wise basis
set of standard metrics commonly employed in object detec- for Normal, Unpaved, and Potholes. The precision, recall,
tion tasks. These metrics play a crucial role in assessing the and F1-score indicate how well accuracy, completeness, and
performance and accuracy of the model. Recall, accuracy, and overall output of each class. The typical F1 score, recall,
F1 score are fundamental metrics that measure the model’s and precision across all classes are also provided. The mean
recognition achievement by evaluating its precision by the average accuracy (mAP) at a 0.5 IoU threshold represents its
expected bounding boxes, Model’s capacity to recall the real- overall model performance.
world data objects, with their harmonic mean, respectively.
Furthermore, the model’s ability to accurately fit bounding TABLE IV
T HE EFFICIENCY OF THE YOLOV 7 AFTER ENHANCEMENT OF THE
boxes is evaluated using The Intersection over Union (IoU) SAMPLE TEST DATA IS BEING EVALUATED ON A CLASS - WISE BASIS .
metric quantifies the degree of overlap between predicted
and ground truth bounding boxes, providing a measure of Class Label Precision Recall F1-score [email protected]
accuracy in object detection. To provide an overall assessment Normal 0.760 0.80 0.779 0.849
of the model’s performance, Mean Average Precision (mAP) Unpaved 0.949 0.636 0.761 0.785
is computed having a threshold for IoU of 50%, considering Potholes 0.639 0.617 0.627 0.657
Average 0.783 0.684 0.722 0.764
its precision and recall values across all object classes. These
measures collectively offer valuable information about the
effectiveness and reliability with the proposed object detection Table IV presents the overview of the YOLOv7 model’s
model. performance according to the experiment’s dataset after en-
Object detection and recognition are fundamental tasks in hancements. The outcomes are reported on a class-wise basis
computer vision with diverse applications for estimating the for Normal, Unpaved, and Potholes. The precision, recall,
damages. [9]. Object detection is the process of detecting and F1-score indicate by accuracy, completeness, and overall
and localizing objects within images or videos. It involves performance of each class. The average performance across all
identifying the presence of objects and determining their classes is also provided. The mean average accuracy (mAP)
spatial coordinates in the given visual data, enabling tasks like at a 0.5 IoU threshold represents overall model performance.
tracking. Object recognition focuses on classifying objects, These outcomes show the effectiveness regarding the model
aiding in scene understanding and higher-level reasoning. enhancements, improving accuracy till its overall performance
These tasks have broad applications, including autonomous in detecting road damages and predicted bounding box and
driving, surveillance, and image understanding. Advancements class labels is shown in Fig. 5.
in computer vision enhance AI system’s visual data interpre-
tation has broad applications across industries. B. Object Recognition Results
Object recognition is a computer vision challenge that
A. Object Detection Results entails recognizing and classifying objects in images or videos.
The results of the object detection study is evaluated two ap- Model will be trained using labeled datasets to learn features
proaches for object detection. One approach used the original and map them to specific object classes. During inference, the
dataset, while the other applied data enhancements through model detects and classifies objects in new data based on its
pre-processing techniques. Comparing the results, the study learned knowledge. Performance is evaluated using metrics
found that data enhancement techniques improved the object like precision, recall, and F1-score. Object recognition has
detection model’s performance. These findings emphasize the applications in autonomous vehicles, surveillance, augmented
effectiveness of pre-processing in enhancing object detection reality, and robotics.
Two approaches were used to evaluate the object recognition
model: training with the original dataset and training with an
enhanced dataset using pre-processing techniques. Comparing
the results yielded insights into the impact of data enhance-
ment on recognition performance. This study demonstrates the
effectiveness of pre-processing techniques in improving object
recognition accuracy as demonstrated in table V and VI
TABLE V
C LASS - WISE C LASSIFICATION P ERFORMANCE OF THE P ROPOSED
M ETHOD B EFORE E NHANCED T EST DATASET
Table V proposed method’s class-wise classification per- Fig. 6. (a) water-covered pothole, (b) normal road with shadows (c) &
unpaved (d) multiple detections of road damage
formance on the test dataset is presented, including preci-
sion, recall, and F1-score metrics for three classes: pothole,
unpaved, and normal. Precision measures the accuracy of
classifying instances, recall indicates the ability to correctly Fig. 7 shows the confusion matrix for the proposed method
identify instances within each class, and the F1-score balances on the test dataset. It summarizes the model’s performance
both measures. The ”Macro avg” row provides an overview of by displaying the numbers of genuine positives, genuine
overall performance. This table offers insights into the model’s negatives, genuine false positives, and genuine false negatives.
classification capabilities for different road damage classes, It assists in assessing each class’s accuracy and mistake rates.
enhancing the way of understanding its effectiveness in various
scenarios.
TABLE VI
C LASS - WISE C LASSIFICATION P ERFORMANCE OF THE P ROPOSED
M ETHOD A FTER E NHANCED T EST DATASET