Traffic Congestion Estimation Using YOLO Transfer Learning-Final Paper
Traffic Congestion Estimation Using YOLO Transfer Learning-Final Paper
Transfer Learning
Hannah S Mohammed Tanzil
Assistant Professor, Department of Student, Department of Computer
Computer Science and Engineering Science and Engineering
Vel Tech Rangarajan Dr. Sagunthala Vel Tech Rangarajan Dr. Sagunthala
R&D Institute of Science and R&D Institute of Science and
Technology Technology
Avadi, Chennai, Tamil Nadu Avadi, Chennai, Tamil Nadu
[email protected] [email protected]
Abstract— Traffic congestion is a persistent problem in response time, and threatens public safety. These added
metropolises, leading to delays, excessive fuel consumption, pressures strain city infrastructure, lower economic
and environmental pollution. Conventional traffic monitoring productivity, and diminish urban dwellers' overall quality of
systems employ embedded sensors and manual supervision life. Solutions to these problems must be creative and aimed
that do not provide real-time and scalable solutions. This paper
offers a deep learning-based congestion prediction system
at optimizing traffic flows, decreasing pollution, and
employing YOLO transfer learning to detect vehicles, analyze making urban transport systems more robust in the face of
traffic density, and classify congestion levels dynamically. The increasing urbanization.
model is trained on real-world datasets, making it robust
under varying lighting, weather, and traffic conditions. Unlike Traditional traffic management systems often depend on
conventional approaches, this system monitors real-time fixed sensors, manual counts, and pre-programmed signal
traffic, allowing authorities to make data-driven judgments. timings that cannot dynamically respond to real-time traffic
An adaptive traffic signal management method is implemented [14]. So many bridge and flyover construction projects in a
to optimize signal durations based on congestion severity, city cannot solve the problem of traffic congestion. An
enhancing traffic flow efficiency and minimizing wait times at
crossings. By merging computer vision and artificial
intelligent traffic management system must be designed to
intelligence, this technique optimizes urban traffic control the traffic sequence [13]. Fixed-time signal plans;
management, contributes to smart city projects, and lays the this is one of the most popular methods, however, they do
groundwork for future developments in intelligent not adapt to the varying vehicle density, so they are a highly
transportation systems. Future developments include refining inefficient means of controlling traffic flow.
the model for edge computing, including predictive traffic
analytics, and scaling the system for large-scale Sensor-based systems like inductive loops and radar-based
implementation in urban cities. detectors have high installation and maintenance costs,
which makes wide-scale deployment at venues
Keywords— Traffic Congestion, YOLO, Transfer Learning,
Object Detection, Smart Cities, Deep Learning
unaffordable or impractical. Manual traffic monitoring,
while still widely used, is time-consuming, prone to human
I. INTRODUCTION error, and therefore increases its reliability in congestion.
Commuters in India’s largest cities spend 1.5 hours longer
on their daily trips than other travelers. Cities such as those
Urbanization has driven a massive increase in vehicular in Asia are losing up to $22 billion annually. These
traffic that is causing serious congestion problems in constraints indicate the need for developing intelligent
metropolitan areas around the world. It is driven by speedy traffic monitoring systems that can automatically determine
industrialization, a growing population, and climbing road congestion and optimize road usage in real-time.
vehicle ownership, resulting in high traffic density, which
critically obstructs urban mobility. Congestion not only Deep learning models have transformed traffic monitoring
leads to fatal accidents but also wastes fuel and increases because machine learning-based techniques have broken out
the environmental footprint. Excessive waiting times at from the constrained validation range. S. Sri Harsha et al.
intersections generate unnecessary greenhouse gas [10] and Pejman Niksaz [12] proposed an image processing-
emissions, which degrade air quality and contribute to based framework for the detection, counting, and
climate change. In addition, congestion makes public classification of vehicles in traffic video. For a more
transport break down, leads to delays, inefficient emergency complex urban environment, advanced deep learning-based
object detection algorithms have been showing high management significantly improves urban mobility
performance in the localization and classification of compared to traditional fixed-time signal systems [2].
different object categories like vehicles. In this regard, You
Only Look Once (YOLO) is a state-of-the-art real-time Rahman et al. investigated the application of image
object detection framework. Traditional detection methods processing and data mining techniques for traffic density
typically involve several passes through an image. The estimation. They implemented a hybrid system combining
YOLO algorithm treats object detection as a single image-based vehicle detection with predictive data analytics
regression problem, which makes it faster than its to estimate congestion patterns. The study concluded that
predecessors: it runs at real-time speeds while providing integrating data mining techniques with deep learning
high mAP (mean average precision) scores. YOLO is used models improves congestion forecasting [3]. Redmon and
for traffic congestion estimation by detecting vehicles, Farhadi introduced YOLOv3 as a real-time object detection
classifying them, tracking their motion across multiple model. The model demonstrated improved speed and
frames, and helping find the traffic density level. The accuracy over previous versions, making it suitable for
model utilizes a grid for prediction, allowing it to batch applications such as traffic surveillance. Their work laid the
process multiple vehicles at once, making it most applicable foundation for using YOLO in various computer vision
for real-time data collection around the flow of traffic and tasks, including congestion estimation [4]. Bochkovskiy et
assessing congestion. Using deep learning and real-time al. further optimized YOLO with YOLOv4, enhancing its
video processing, YOLO can achieve high-speed, high- real-time detection capabilities while maintaining high
accuracy vehicle detection and is therefore suitable for accuracy. The authors introduced new training strategies
smart traffic systems that made YOLOv4 more efficient for large-scale traffic
analysis. Their research reinforced the role of deep learning
This study combines the concept of transfer learning for in intelligent traffic monitoring [5].
YOLO to formulate a system to estimate traffic congestion
in real-time. YOLO models trained on generic datasets can Jiang et al. developed an improved YOLOv5 model that
be adapted to specific application domains (i.e., road types) incorporates a balanced feature pyramid and an attention
using transfer learning with domain-specific traffic datasets, module for enhanced traffic sign detection. Their model
enabling better detection and classification of vehicles achieved superior accuracy in recognizing traffic signs
under diverse road conditions. Using the density of vehicles under various environmental conditions. This research
across multiple frames, the system knows if there is provides insights into optimizing object detection models
congestion and can recommend to traffic management for urban traffic analysis [6]. Lin et al. introduced the
authorities. Furthermore, it integrates an adaptive traffic Microsoft COCO dataset, which is widely used for training
signal control module that adjusts the signal timings and evaluating object detection models. Their dataset
according to the intensity of vehicular congestion, thus provides a rich collection of labelled images, enabling
enhancing the efficiency of road usage and reducing the researchers to enhance deep learning models for real-world
waiting time at junctions. But the solution proposed is a applications such as traffic congestion estimation [7].
scalable, low-cost, and infrastructure-agnostic solution that Redmon et al. originally developed YOLO as a unified, real-
can be configured for other cities. This allows for a time object detection framework. Their research
sustainable alternative to traditional traffic management demonstrated the advantages of YOLO over traditional
systems, supporting the continuous development of object detection pipelines, emphasizing its speed and
intelligent transportation systems (ITS) and smart city accuracy in detecting multiple objects per frame [8].
frameworks.
Arcos-Garcia et al. evaluated deep neural networks for
II. RELATED WORKS traffic sign detection systems, emphasizing their efficiency
in complex traffic environments [9]. Harsha and Sandeep
It has been investigated by some studies about the developed a real-time traffic density estimation system
estimation of traffic congestion through machine learning using image processing techniques, improving vehicle
and deep learning approaches. The studies present different counting accuracy [10]. Zhu et al. proposed a robust traffic
approaches, including CNN, an object detection system, and sign detection and classification model that operates
an optimization model based on data. Sungchul Seo et al. effectively under challenging real-world conditions [11].
use deep learning techniques to detect temporary traffic
control devices in a study. In particular, they worked on Niksaz introduced an automatic traffic estimation system
researching temporary traffic control devices that improve utilizing image processing methods, highlighting the
overall road safety, with a specific focus on construction importance of visual data in congestion management [12].
sites. Authors demonstrated how models like YOLO, which Venkataraman and Pinto discussed the impact of operations
stands for You Only Look Once, in deep learning, can be management on global supply chains, underscoring the
used to improve accuracy for detection in traffic necessity of efficient traffic control mechanisms in logistics
management systems [1]. Gadge et al. proposed a smart [13]. Xu et al. examined multi-agent-based demand-
traffic control system utilizing deep learning. Their model responsive transport systems, aligning with adaptive traffic
dynamically adjusts traffic signals based on real-time signal control methodologies [14].
congestion levels detected using object recognition
techniques. The study demonstrated that AI-driven traffic Janrao et al. proposed a real-time traffic density estimation
approach using image processing, optimizing vehicle count
accuracy for improved congestion monitoring [15]. Rahmat pre-processing is important for achieving better accuracy
and Jumari explored vehicle detection using image and efficiency in the following steps.
processing techniques for traffic control and surveillance,
proving its effectiveness in congestion reduction [16]. Talib The pre-processed frames are then passed to the heart of the
et al. introduced an enhanced YOLOv8 model for real-time system: the YOLO-based model. The YOLO is an object
object detection, refining accuracy and computational detection algorithm used to detect and localize vehicles in
efficiency [17]. each frame. The model then returns bounding boxes of the
detected vehicles along with a confidence score for each
Jocher et al. contributed significantly to the development of detection, indicating how confident the model is that a
YOLO by providing extensive enhancements and open- vehicle is present. These bounding boxes with their
source implementations [18]. Hussain presented a associated confidence scores are then fed to a vehicle
comprehensive review of YOLO models from v1 to v8, counting and density estimation module. Thus far, the
analyzing their evolutionary improvements in object system has retrieved crucial image features, such as the
detection performance [19]. Pragati Satpute and Dipti Patil number of vehicles detected within a frame or a specific
researched object detection using YOLOv8, focusing on its location of interest, providing a quantitative assessment of
applicability in real-time scenarios, further demonstrating its traffic density. Vehicle density: a per-unit area count of
efficiency in traffic congestion monitoring [20]. cars, giving a normalized sense of how congested the
vehicle space is. Then, they are sent to a classification or
The above-discussed studies, in general, show that deep regression module for the congestion level, which classifies
learning and object detection models can revolutionize the value of density computed into their corresponding
existing traffic congestion monitoring systems. With the congestion level (like free-flow or congested) or obtains the
power of advanced algorithms like YOLO, these systems density value (which may allow simplified classification or
provide real-time vehicle detection, traffic density analysis, even continuous support). This second module could use
and dynamic congestion classification that can be more ML algorithms or rule-based systems to conduct the
efficient and scale to more roads than traditional classification or regression tasks.
approaches. Yet variations even with impressive state-of-
the-art performance still exist as we seek to advance our Using the object detection capabilities of YOLOv8, the
networks with superior accuracy, resources, and wider system analyzes vehicle density, speed, and flow patterns to
application deployment into smart city structures. Such estimate congestion. It processes real-time video feeds,
limitations will need to be addressed for widescale identifies vehicles, and classifies congestion levels based on
deployment to be possible, to better manage urban traffic, predefined thresholds. It also factors in contextual attributes
and to support the infrastructure for intelligent like time of day and location to improve congestion
transportation systems that can cope with the environment. predictions. The data is processed and is displayed on a user
interface or dashboard, which yields real-time information
about the congestion to the users. By utilizing real-time
III. METHODOLOGY data, traffic management authorities can make data-driven
decisions by modifying traffic signal timings, stationing
incident management resources, or providing the general
A. Proposed Methodology driving population with traffic updates. With the ability to
process video feeds in real-time, the system quickly
The system has a core objective to observe vehicle density analyzes incoming frames to produce relevant and timely
in real time and accurately predict traffic congestion. It insights regarding traffic conditions at each juncture.
starts with capturing raw real-time video streams from Switching between lanes and detecting the presence of road
many sources on the market, including roadside cameras, signs and signal lights in the vicinity is a non-existing
centralized traffic management systems, or drone-mounted technology in a real traffic environment, however, and
cameras that can provide a bird’s-eye view of traffic combining advanced computer vision techniques, deep
behavior. The raw input for the system comes in the form learning, and real-time data processing to be able to process
of these video feeds. After collecting the video streams, we this amount of information undoubtedly leads to important
went to the pre-processing phase, where the data is leaps in intelligent traffic management and smart city
prepared for analysis. In this stage, the video frames are initiatives.
resized to a standardized resolution that matches the input
size required by the YOLO model, color spaces might be
converted (e.g., at times RGB to grayscale), and image
quality might be enhanced where needed, to ensure that the
object detection algorithm performs optimally. Proper data
localization of cars, the assessment of congestion based on
density is managed by a density-based congestion
classification algorithm.
Next, the recorded video stream enters a critical 1) Sensor-Based Systems: Sensor-based traffic monitoring
preprocessing stage. There are a couple of steps involved in systems utilize physical sensors such as inductive loops,
this process, the first being frame extraction, where frames GPS trackers, and radar systems to collect traffic data.
are captured from the continuous stream of video at a These devices are capable of giving reliable information
specified frame rate. So, all data processed across all regarding vehicle presence, speed, and density. However,
samples (frames) will have a universal input size that is they come with substantial downsides, including expensive
compatible with the YOLO object detection model. installation and maintenance costs, which might be
Resizing: The frames are resized and then normalized. prohibitive for large-scale adoption. Additionally, their
Normalization usually comprises the scaling of the pixel coverage is generally confined to specific locations, and
values in the image to a specified range, which helps the they are prone to failures, particularly in adverse weather
stability and performance of the deep learning model. The conditions such as heavy rain or snow. These constraints
preprocessed photos are subsequently fed into the core of make sensor-based solutions less scalable and reliable for
the system: the YOLO object detection model. This pre- comprehensive traffic control in dynamic metropolitan
trained algorithm, which has learned to recognize and contexts.
categorize objects from a big dataset, analyzes each image
to detect and classify the automobiles in the scene. The
YOLO model produces a collection of bounding boxes 2) Traditional Image Processing: Traditional image
surrounding each detected vehicle, along with confidence processing techniques, such as edge detection, background
scores that indicate the model's certainty about the presence subtraction, and motion tracking, have been widely
and classification of the vehicle. Upon detection and employed for traffic monitoring. These approaches examine
video feeds to recognize and track vehicles, but they
struggle with difficulties including occlusions (e.g., vehicles Robust to Environmental Factors: Unlike conventional
obstructing one another), shadows, and fluctuating lighting image processing techniques, YOLO can handle occlusions,
conditions, which can lower their accuracy. Furthermore, varying lighting conditions, and different traffic densities
typical image processing relies on pixel-wise operations, efficiently.
making it computationally expensive and inefficient for Lower Computational Cost: By leveraging transfer
real-time applications. These constraints underscore the learning, the system requires fewer computational resources
need for more advanced and adaptive approaches to compared to training deep learning models from scratch.
overcome the limitations of existing methods.
4) Real-time traffic congestion estimation: After training, Fig. 3. Model Accuracy Learning Curve
the YOLO model is used for real-time traffic congestion
estimation. The approach in the workflow is as follows:
capturing live traffic footage from road surveillance
Model Loss Learning: The graph below displays the loss To ensure reliable congestion identification in adverse
variation during training. Both training and validation loss weather and night-time settings, the model underwent
exhibit a steady declining trend, demonstrating that the significant modifications. Data augmentation techniques,
model efficiently minimizes the error and prevents including synthetic rain, fog, and low-light transformations,
overfitting. were applied to the training dataset to improve resilience.
Additionally, adaptive histogram equalization was
employed to boost visibility in night-time photos, while
infrared-based vehicle identification was added for low-light
circumstances. These modifications resulted in a 15%
reduction in false negatives during bad vision conditions,
maintaining an F1 Score of 93.8% even in adverse weather.
A. Classification Metrics
Fig. 5. Performance Comparison Graphs
The performance of the YOLO-based traffic congestion
monitoring system was evaluated using key classification
metrics such as accuracy, precision, recall, and F1 score. Figure 5 presents a Bar graph displaying the comparison of
These metrics offer a glimpse into how accurately the model YOLOv5 and YOLOv8 performance in four parameters.
can detect and classify congestion levels by the model. The Proposed Model outperforms competitors across all
criteria, indicating its supremacy in terms of categorization
Accuracy (98%): Accuracy measures how many traffic performance.
circumstances (both congested and non-congested) were
correctly identified by the model out of total cases. The
accuracy is 98%, which means the model works correctly in
real-world traffic very well.
REFERENCES
[1] S. Seo, K. Choi, H. Park, and Y. Jeong, “Deep learning-based
detection of temporary traffic control devices at construction sites
for enhanced road safety,” in Sensors, vol. 21, no. 4, 2021.
[2] A. Gadge, H. Bhawsar, P. Gondkar, P. Chourey, and G. Navale,
“Smart traffic control system using deep learning,” in International
Conference on Smart Cities and Advanced Technologies
(ICSCAT), IEEE, 2022.
[3] M. Rahman and S. Dey, “Application of image processing and
data mining techniques for traffic density estimation and
prediction,” in International Journal of Computer Vision and Image
Processing (IJCVIP), 2020.
[4] J. Redmon and A. Farhadi, “YOLOv3: An incremental
improvement,” arXiv:1804.02767, 2018.
[5] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4:
Optimal speed and accuracy of object detection,” in arXiv preprint
arXiv:2004.10934, 2020.
[6] L. Jiang, H. Liu, H. Zhu, and G. Zhang, “Improved YOLOv5
with balanced feature pyramid and attention module for traffic sign
detection,” in IEEE Transactions on Intelligent Transportation
Systems, vol. 23, no. 4, pp. 2345-2356, 2023.