0% found this document useful (0 votes)
12 views24 pages

MC 4

Uploaded by

sai sai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views24 pages

MC 4

Uploaded by

sai sai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Real-Time object detection and

Instance segmentation using Yolov7

PROJECT GUIDE
TEAM MEMBERS
Mrs. B. Rajeswari
G . Karthik(20761A0422)
Sr. Asst. Professor
G. Manoj Kumar(20761A04E2)
Dept. of ECE
D. Chareesh (20761A0414)
LBRCE
CONTENTS
• Abstract
• Introduction
• Literature Survey
• Problem Statement
• Proposed System
• Conclusion
• References

2
Abstract
Computer vision tasks such as instance segmentation and object detection are
different yet connected. In the domain of instance segmentation, the objective
is to accurately locate and delineate distinct objects in an image, surpassing
basic object identification by offering pixel-level masks for every instance of
an object. In order to produce reliable results, notable approaches in this
domain such as Mask R-CNN and FCN frequently combine segmentation and
object detection algorithms.
However, object tracking focuses on tracking an object's path
through a series of successive frames in a video clip. Retaining the object's
identification in the face of occlusion, scale shifts, and visual modifications is
a difficulty.

3
YOLOv7 is a significant advancement in object tracking and instance
segmentation that provides improved accuracy, a unified framework, real-time
processing, and sophisticated deep learning architecture. It provides an efficient
method for object recognition and tracking along with optimized pixel-level
segmentation masks, surpassing earlier approaches that depended on FCN, and
Mask R-CNN. In order for YOLOv7 to function, a video stream is divided into
frames.
A convolutional neural network is then used to identify objects within
each frame, forecast their movements, and follow those movements over the
course of subsequent frames. Because of its real-time capacity, it is perfect for a
wide range of applications, including instance segmentation jobs, security
surveillance, and autonomous driving.

4
Introduction
Real-time object detection is a computer vision task that involves identifying and locating
objects within a video stream or a sequence of images in near real-time, typically at frame rates
sufficient for applications where timely decision-making is crucial. The goal is to process
incoming visual data quickly and accurately, providing instantaneous feedback on the presence
and location of objects in the scene. This capability finds applications in various domains,
including autonomous vehicles, surveillance systems, robotics, and augmented reality .
YOLO, which stands for "You Only Look Once," is a groundbreaking object
detection algorithm in computer vision. Developed to address the need for faster and more efficient
object detection, YOLO takes a unique approach by framing the task as a single regression
problem, predicting bounding boxes and class probabilities directly from the entire image in a
single pass through the neural network. Unlike traditional two-step methods, such as region-
based convolutional neural networks (R-CNN), which involve separate region proposal and
classification stages, YOLO's one-step process significantly reduces computation time, making
it well-suited for real-time applications.

5
LITERATURE SURVEY
S.no Article Title Techniques used Datasets Performance metrix
used
1. Enhanced lung image FCN, Montgomery County X-ray
segmentation using deep SegNet, set Accuracy-95.84%
learning U-Net, Shenzhen Hospital X-ray Precision-89.04%
U-Net++ set Recall-89.05%

A Survey on Instance Mask RCNN, Microsoft Common Objects


2. Segmentation: State of TensorMask, in Context (COCO) Dataset,
the art DeepMask, PASCAL VOC, Cityscapes, Average
InstanceFCN, HTC, Mapillary Vistas Dataset Precision (AP)-48.6%
PANet, Mask Scoring (MVD)
RCNN, MPN,
YOLACT, Deep
Watershed Transform,
InstanceCut, Fast-C
NN, Faster RCNN

Leaf Instance images of 27 tobacco (A3) Average Precision-89.9%


3. Segmentation and Mask R-CNN and 159 Arabidopsis (A1
Counting based on Deep and A2) plants. Symmetric Best Dice
Object Detection and (SBD)-76.3%
Segmentation Networks

6
S.no Article Title Techniques used Datasets Performance metrix
used

HCFS3D: Hierarchical PointNet++, Graph Stanford 3D Indoor mIoU (mean intersection


coupled feature selection convolutional networks Semantics Dataset (S3DIS), over union)-89.1%
network for 3D semantic and (GCNs), Sparse convolution Richly-annotated 3D
4. instance segmentation Reconstructions of Indoor Average Precision-54.01%
Scenes (ScanNet-v2), Part
segmentation dataset
(ShapeNet)

A new deep-learning fully convolutional neural StrawDI_Db1 dataset mean average precision
5. strawberry instance network (CNN), (mAP)- 52.61%
segmentation methodology U-Net
based on a fully mean instance intersection
convolutional neural network over union (mI2oU) -
93.38%

frames per second (fps)-30

MSIS: Multispectral Instance multispectral single-stage Power Equipment Dataset, average precision (AP) -
Segmentation Method for instance segmentation FLIR Thermal Imaging 40.06%
Power Equipment (MSIS) Dataset
6. F1 score -62.37%

Application of one-stage YOLACT MS COCO dataset average precision (AP) –


instance segmentation with 71.5%
7. weather conditions in
surveillance cameras at
construction sites

Augmented Reality Meets Multi-task Network Cascade KITTI-360, average precision (AP) –
Deep Learning for Car (MNC), Ground Plane KITTI-15, 72.8%
Instance Segmentation in Estimation, 360-Degree Virtual KITTI
8. Urban Scenes Panoramas
7
People Tracking in Video Faster-RCNN 23-day 15-camera dataset, Average Precision (AP)— 95.3%
Surveillance Systems Based on
GOTURN tracking algorithm,
Artificial intelligence
9. YOLOv7 Caremedia 23d dataset

Long-Distance Person Detection YOLOv7 TinyPerson dataset Average Precision (AP)—95.14%


Based on YOLOv7

10.

YOLOv7-UAV: An Unmanned Aerial YOLOv7 VisDrone2019 Dataset, Average Precision (AP) (VD)-
Vehicle Image Object Detection 96.22%
Algorithm Based on Improved
11. YOLOv7 TinyPerson Dataset
Average Precision (AP) (TP)-
95.75%

8
Problem Identified

• Mask R-CNN:A deep learning model called Mask R-CNN is intended for instance
segmentation. It adds a branch to Faster R-CNN to predict masks at the pixel level in
addition to bounding boxes. This makes it possible to precisely separate objects in
photos, which is beneficial for applications like autonomous driving and medical
image analysis.

• FCN:A kind of deep learning architecture called an FCN (Fully Convolutional


Network) is intended for semantic segmentation instance segmentation. It is
appropriate for applications like object recognition and picture segmentation because it
makes use of convolutional layers, skip connections, and upsampling layers to
generate dense pixel-wise predictions.
Mask R-CNN and FCN may not be the most effective solution for real-time applications
like live tracking, as it is mainly meant for picture segmentation tasks. You may want to
use lightweight models like YOLO (You Only Look Once) for faster real-time tracking.
Because of their speed optimization, these models can perform better in situations where
low latency is essential.

Compared to Mask R-CNN and FCNs, YOLOv7's primary advantages


are its speed and capacity for simultaneous object detection. Even though Mask R-CNNs
and FCNs can both generate more precise segmentation data, they are typically slower
and might not be as appropriate for real-time applications like autonomous cars and
video surveillance.

10
PROPOSED SYSTEM

YOLOv7 is an object detection algorithm based on deep learning architecture. It


uses convolutional neural networks (CNNs) to identify and locate objects in
images. YOLOv7 can detect multiple objects in an image simultaneously and is
faster than some other object detection algorithms. It can be used for a variety of
applications, including video surveillance and autonomous vehicles.

11
YOLOV7 ARCHITETURE

12
The architecture provided appears to be a key component of the YOLOv7 model,
showcasing the internal workings of the algorithm. Let's delve into a detailed explanation
of this architecture:
• Backbone: The "Backbone" section likely corresponds to the initial layers of the model,
typically containing powerful convolutional neural networks (CNNs) such as
CSPDarknet53. This stage is responsible for processing the input image and extracting
hierarchical features.
• Feature Pyramid Network (FPN): The "FPN" section represents the Feature Pyramid
Network, which generates feature maps at different scales and facilitates multi-scale
object detection. This allows the model to effectively detect objects of various sizes
within the input image.
• Neck Architecture: The "Neck Architecture" could incorporate specific network layers
or modules, possibly based on PANet or CSPNet designs. This part of the architecture
aims to further improve feature extraction and information fusion across different scales
within the feature pyramid.

13
• Detection Head: The "Detection Head" is likely the portion responsible for predicting
bounding boxes, objectness scores, and class probabilities for objects at various scales.
This step aids in the precise identification and localization of objects in the input
image.

• Post-processing: The model further utilizes post-processing techniques, such as non-


maximum suppression (NMS), to refine the predicted bounding boxes and produce a
final set of confident and non-overlapping object detections.

• Output: The final output contains the bounding box coordinates, class labels, and
confidence scores for the detected objects, presenting the precise localization and
identification of objects within the input image.

14
RESULTS
1.Traffic
2.image
3. Web cam
4. Video live streaming
CONCLUSION
In summary, this research introduces a real-time object detection system leveraging the
YOLOv7 algorithm, specifically utilizing the yolov7-w6-pt and yolo7-e6e-pt weights to
detect and classify traffic participants, including vehicles, pedestrians, and various objects.
The study encompasses the collection of diverse datasets, contributing to a robust evaluation
of the algorithm's performance. Results indicate a notable improvement in precision and
processing speed compared to previous iterations of the YOLO algorithm, showcasing the
efficacy of the proposed system. Notably, the inclusion of a multiscale feature pyramid
network and a bounding box regression algorithm enhances the accuracy of traffic
participant identification. The findings suggest that the developed real-time system holds
promise for integration into smart road systems, self-driving cars, and intelligent railway
networks. The potential impact on accident reduction adds a compelling dimension to the
system's relevance and applicability in advancing safety measures

20
REFERENCES
• [1]. Gite, S., Mishra, A., & Kotecha, K. (2022). Enhanced lung image segmentation using
deep learning. Neural Computing and Applications, 1-15.
• [2].Hafiz, Abdul Mueed , and Ghulam Mohiuddin Bhat. ”A survey on instance
segmentation :state of art.” International journal of multimedia information retrieval
9.3(2020):171-189.
• [3]. Xu, L., Li, Y., Sun, Y., Song, L., & Jin, S. (2018, December). Leaf instance
segmentation and counting based on deep object detection and segmentation networks.
In 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems
(SCIS) and 19th International Symposium on Advanced Intelligent Systems (ISIS) (pp.
180-185). IEEE.
• [4]. Tan, J., Wang, K., Chen, L., Zhang, G., Li, J., & Zhang, X. (2021). HCFS3D:
Hierarchical coupled feature selection network for 3D semantic and instance
segmentation. Image and Vision Computing, 109, 104129.

21
REFERENCES
• [5]. Perez-Borrero, I., Marin-Santos, D., Vasallo-Vazquez, M. J., & Gegundez-
Arias, M. E. (2021). A new deep-learning strawberry instance segmentation
methodology based on a fully convolutional neural network. Neural Computing
and Applications, 33(22), 15059-15071.
• [6]. Shu, J., He, J., & Li, L. (2022). MSIS: Multispectral instance segmentation
method for power equipment. Computational Intelligence and Neuroscience,
2022.
• [7]. Kang, K. S., Cho, Y. W., Jin, K. H., Kim, Y. B., & Ryu, H. G. (2022).
Application of one- stage instance segmentation with weather conditions in
surveillance cameras at construction sites. Automation in Construction, 133,
104034.
• [8]. Alhaija, Hassan Abu, et al. "Augmented reality meets deep learning for car
instance segmentation in urban scenes." British machine vision conference. Vol.
1. No. 2. 2017.

22
References
• [9].Nasry, A., Ezzahout, A., & Omary, F. (2023). People Tracking in Video
Surveillance Systems Based on Artificial Intelligence. Journal of Automation,
Mobile Robotics and Intelligent Systems, 17(1), 59-68.
• [10].Tang, F., Yang, F., & Tian, X. (2023). Long-Distance Person Detection Based
on YOLOv7. Electronics, 12(6), 1502.
• [11].Zeng, Y., Zhang, T., He, W., & Zhang, Z. (2023). Yolov7-uav: An unmanned
aerial vehicle image object detection algorithm based on improved
yolov7. Electronics, 12(14), 3141.

23
This Photo by Unknown Author is licensed under CC BY

You might also like