0% found this document useful (0 votes)
41 views24 pages

723 Seminar Report

Uploaded by

maheshreddi222
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views24 pages

723 Seminar Report

Uploaded by

maheshreddi222
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

OBJECT DETECTION USING YOLO

A
Seminar Report
Submitted to
Jawaharlal Nehru Technological University, Hyderabad
in Partial Fulfilment of the requirements for the Award of the Degree
of

Bachelor of Technology
Computer Science in Data Science

By
Jakkula Nanda Kishore Yadav 21E11A6723

Under the Guidance of

MANOHAR GOSUL
Assistant Professor, Department of Computer Science and Engineering

Department Computer Science and Engineering

BHARAT INSTITUTE OF ENGINEERING AND TECHNOLOGY


(Affiliated to JNTU Hyderabad, Approved by AICTE, Accredited by NAAC)
Ibrahimpatnam – 501 510, Hyderabad, Telangana
2021-2025 Batch
BHARAT INSTITUTE OF ENGINEERING AND TECHNOLOGY
(Affiliated to JNTU Hyderabad, Approved by AICTE, Accredited by NAAC)
Ibrahimpatnam – 501 510, Hyderabad, Telangana

CERTIFICATE
This is to certify that the seminar project work entitled “OBJECT DETECTION
USING YOLO” is a beneficial project work carried out by

Jakkula Nanda Kishore Yadav 21E11A6723

in the department of Computer Science In Data Science at Bharat Institute of


Engineering and Technology, Hyderabad is submitted to Jawaharlal Nehru
Technological University, Hyderabad in partial fulfilment of the requirements for the
award of the degree of Bachelor of Technology degree in Computer Science In Data
Science during 2024-25.

Guide: Head of the Department:

Mr’s Manohar gosul Dr. E Srilaxmi


Assistant Professor Associate Professor
Dept. of IT, Dept. of EEE,
BIET, Hyderabad. BIET, Hyderabad.

Principal
BIET, Hyderabad.

Viva-Voce held on: ______________


List of examiners Signature with date
1. Internal Examiner
2. External Examiner
BHARAT INSTITUTE OF ENGINEERING AND TECHNOLOGY
(Affiliated to JNTU Hyderabad, Approved by AICTE, Accredited by NAAC)
Ibrahimpatnam – 501 510, Hyderabad, Telangana

DECLARATION

I, Jakkula Nanda Kishore Yadav (21E11A6723), hereby declare that this Seminar
Report titled “OBJECT DETECTION USING YOLO” is a genuine work carried out
by me in the B.Tech (Computer Science and Engineering – Data Science) degree
course of Jawaharlal Nehru Technological University, Hyderabad and has not been
submitted to any other course or university for the award of the degree by us.

Jakkula Nanda Kishore Yadav 21E11A6723

Submitted By
Name: JAKKULA NANDA KISHORE YADAV
21E11A6723
SEMESTER – 7TH
ACKNOWLEDGEMENT

Over a span of one year, BIET has helped us transform ourselves from were
amateurs in the field of Computer Science into skilled engineers capable of handling
any given situation in real time. We are highly indebted to the institute for
everything that it has given us.

I would like to express my gratitude towards the principal of our institute, and the Head
of the CSE Department, for their kind cooperation and encouragement which helped us
complete the project in the stipulated time.

Although we have spent a lot of time and put in a lot of effort into this Seminar
project, it would not have been possible without the motivating support and help of
our project guide I thank him for his guidance, constant supervision and for providing
necessary information to complete this project.. Our thanks and appreciations also go to
all the faculty members, staff members of BIET, who have helped me put this project
successfull .
ABSTRACT

Object detection based on the YOLO has achieved very good performances. However,
there are many problems with images in real-world shooting such as noise, blurring and
rotating jitter, etc. These problems have an important impact on object detection. Using
traffic signs as an example, we established image degradation models which are based
on YOLO network and combined traditional image processing methods to simulate the
problems existing in real-world shooting. After establishing the different degradation
models, we compared the effects of different degradation models on object detection.
We used the YOLO network to train a robust model to improve the average precision
(AP) of traffic signs detection in real scenes. The algorithm operates by dividing the input
image into a grid and predicting bounding boxes, objectness scores, and class
probabilities for each cell, followed by a non-maximum suppression step to filter
redundant detections. Object detection is an important branch of computer vision.
General purpose of object detection is to be fast and accurate. Good algorithms of
object detection should be convenient for people's lives.

Keywords— YOLO network, image processing, object detection


Table of Contents:

S.no TOPIC
1. Introduction
1.1- Introduction to YOLO detection
1.2- Working of YOLO algorithm
1.3- Objectives
1.4- Methodology
2. Literature Survey
3. System development
4. Personal Analysis
4.1- Sample inputs and outputs
5. Conclusion and Future use
6. References
I.INTRODUCTION:

1.1 Introduction to YOLO detection


YOLO detection system: We see on the photo and without delay recognize the
objects withinside the photo, their location, and the way they're related. Our visible
framework is speedy and correct, permitting us to carry out complicated
obligations together with riding with small care. Fast and correct item recognised
method will permit PCs to power motors without particular sensors in any
weather, permit auxiliary gadgets to transmit steady scene statistics to human
customers, and open up capability for responsive and sensible mechanical
frameworks. There are distinct techniques for figuring out objects. They may be
divided into categories. The first calculation relies upon at the classification. CNN
and RNN. In this case, we have to pick the areas of hobby withinside the photo, and
we have to use convolutional neural networks to symbolize them. The subsequent
class is the calculation that is based on regression. The YOLO approach is protected
on this class. In this regard, we are able to now no longer pick out exciting locations
withinside the photo. All different matters being equal, we expect the class and
leap container of the entire photo in a single calculation, and we use an unmarried
neural community to discover a massive range of objects.

Figure : TheYOLO detection system: Processing


images with YOLO is simple and straightforward.

Our system, 1) Resizes the input image to 448 x 448

2) Runs a single Convolution network on the image

3) Thresholds the resulting detections by the


models confidence.

In addition, YOLO achieves more than double the normal mean accuracy of other
constant frames. Second, YOLO bases set-up and test time in predictions around the
world about the image, so that relevant data is encoded about classes such as their
appearance. Quick R-CNN, a superior placement strategy, confuses the base patches
in an image with objects because it cannot see the larger area. In contrast to Fast R-CNN,
YOLO doesn't exactly generate a large part of the amount of base errors. Third, YOLO
do normal representations of elements. Until it is processed in normal images and tested
in fine art, YOLO outperforms important recognition strategies such as DPM and
R-CNN by a huge advantage. exceptionally generalizable, it is reluctant to separate when
applied to new areas or unforeseen information. incredibly generalizable it's miles
greater averse to split while implemented to new areas or unexpected info

1.2- Working of YOLO algorithm


Initial, a picture is taken and YOLO calculation is applied. In our model, the picture is
separated as lattices of 3x3 networks. We can isolate photos to any type of network,
depending on the complexity of the photo. What happens then is that the image is
isolated, and each matrix is grouped and confined by the article.. An assessment of the
objectivity or reliability of each network was found. If the corresponding element is not
found in the frame, then the objectivity and error matrix evaluation at that moment is
zero, or if the item is found on the Internet, the objectivity at that moment is 1, and the
value conversion field is you The comparison of the ratings of the identified items
rejected.

The YOLO algorithm is used to determine the precise conversion in the image. Divide the
image into an N x N net using the transformation field of each matrix and class
probability. Image composition and element boundary strategies are applied to each
image grid, and each frame is highlighted with a label. At this point, the calculation
checks each frame independently and records the name of the element and gives its
transition field. Cancel the key name that has no object.

If at least two molds contain similar items, determine the average target of the items at
this stage and take the grid from this point. For this reason, we can use methods to
achieve precise openings of the items. Excessive attachment rather than maximum
inhibition. In IoU (Intersection over Union), it uses actual and expected estimates of the
error block and uses the following formula to calculate the IoU of the two blocks.
Unified Detection:
Here it is combined the distinct parts identified by the project into a single nerve network.
Our organization uses the instance of the entire image to predict each appeared box. It
also tells every transition windows of the image at the equal time. This means that our
organization considers the big picture and all its elements on a global scale. The YOLO
setting enables cooking from intial to terminate at a same speed with high normal
accuracy.

Our structure divides the information image into an N × N matrix. If the focus of an
element falls on a grid unit, this unit in the network is responsible for distinguishing the
object. Each cell in the matrix predicts the block of rebound B. The confidence value of
these confidence values reflects the degree of certainty of the model to the elements
contained in the box and the accuracy of the predicted case.

The array (x, y) refers to the focus of the frame compared to the cell boundary of the
frame. Compared with full screen, assume width and height. After all, trust prediction is
about changes between cashiers, and each cell in the network also predicts the probability
of the restricted class Probability (class|object). These probab are formed in the cells of
the matrix including the elements. We expect that each unit in the network has only one
set of class probabilities, and seldom pays attention to the number of boxes.
1.3- Objectives
The Aim of this model is to understand of gadgets using You Only Look Once (YOLO)
method. In this approach, it has some factors of hobby whilst contrasted with different
objects popularity algos. So to exceptional algos like Convolutional Neural Network,
quick Convolutional Neural Network the computing might not take a gander on the
photo completely but in YOLO the computation appears the photo completely via way
of means of anticipating the leaping containers using convolutional community and it
may differ from elegance possibilities for those bins and acknowledges the photo faster
whilst be contrary to with exceptional algorithm. There are exceptional techniques for
item matching, they may be different into classifications, former is the computation
depending on categorizing.

1.4-

Methodology
The fundamental advances that are continued in object detection using YOLO as
indicated by the paper introduced by Joseph Redmonet.al[1],are as per the following:
● A S x S framework is gotten by separating the picture .B boundary boxes are
anticipated by every framework cell.
● Intersection of Union(IoU): It is a measurement for assessment in item discovery.
Consider Figure 1where the zones of and Ground Truth Predicted bouncing boxes are
appeared. We compute IoU as IoU=Area of overlap / Area of union.

The Architecture. Our identification network has 24 convolutional layers followed by 2 completely associated
layers. Exchanging 1 × 1 convolutional layers decrease the highlights space from going before layers. We pretrain
the convolutional layers on the Image Net grouping task a large portion of the goal (224 × 224 information
picture) and afterward two fold the goal for recognition.

Confidence score for each boundary box is given as Pr (Object). Iou (truthpred) The
Confidence score for cells with no object should be zero. A sum of 5 expectations are
made for each boundary box: x , y, w hand confidence along side C .The focal point of
the case comparative with the network square is (x , y). w is the width and h is tallness
that is comparative with the whole picture . C is the conditional class probabilities. At test
time ,the accompanying formula gives confidence scores of each class for each of the
cases:

This gives us two value , with what likelihood a class will be a piece of the container and
with what confidence it
fits that crate . The expectations are encoded as
S×S×(B*5+C). When assessing YOLO on PASCALVOC[6],which has 20 named classes
for example C=20 , take S=7,B =2.The last expectation nis given as a 7*7*30tensor

II. LITERATURE SURVEY:


YOLO, Joseph Redmond. Her previous work was devoted to distinguishing objects
through computational repetition. In order to achieve high accuracy and high
expectations, they proposed the YOLO calculation in this article. Related to the object
detection program before YOLO, similar to R CNN, YOLO introduces a unique repeated
combination design to remove the pictures in the wrong block and find the category
probability of each situation, which means that YOLO is a lot faster , And he gives more
accuracy, he can also effectively predict the art. YOLO based on CNN Family and Huang
Du. In this document, CNN, R-CNN and other families generally clarified the position of
the item and considered its effectiveness, and introduced the calculation of YOLO into
the increasingly fierce competition. The return of structured inference, Matthew B.
Blashko.

This document is about finding objects. In doing so, they used a bounding box
strategy to limit the elements to overcome the shortcomings of sliding window
technology. Quick YOLO is a universally useful tool for finding objects with the fastest
input, and YOLO is gradually improving object recognition. YOLO is also well suited for
new areas, making it ideal for applications that require fast and reliable placement of
elements. Ryumin Zhang, Yifeng Yang's obstacle detection algorithm used by Yolo in his
research. The pictures of the main obstacles are named and used to prepare YOLO. The
article summary is used to eliminate the inconvenience of light and pleasant. Various
types of scenarios were demonstrated, such as walkers, seats, books, etc., to prove the
feasibility of such deterrence detection calculations.

III. SYSTEM DEVELOPMENT:


In this part, we will discuss the impact we will have when building the structure. As a
result of some review, we hope to use non-intrusive strategies when building our system.
The ways of recognizing articles and adding titles are different, but we chose the
YOLOv3 method and contour planning scan bar. This method was chosen because it
becomes more powerful and meaningful under different conditions.
We design a system which depends on the following two stages:
1. Object Detection
2. Captioning the image and Video Feedback.
Experimental setup:
 Environment: Python was chosen as the programming language because it is an
important programming language that is not difficult to learn and program,
making it a programming language widely used to create artificial intelligence and
deep learning calculations. Reveals the complexity of YOLOv3. YOLOv3
includes:
o Highway: CSPDarknet53
o Neck: SPP [25], PAN
o Head: YOLOv3 YOLO v3 Employees:
o Spine System Special (BoS) Package:
o Mixed Start, Intermediate Association (CSP),
o Multiple Weighted Residual Association (MiWRC) )
o Goodie Bag of Identifiers (BoF): CIoU error, CmBN,
 Drop Block regularization, mosaic information increase, self-hatred training,
matrix effect elimination, use different anchor points to obtain the lonely earth
truth, cosine annealing programmer, most Good hyperparameters, random
processing form
 identifier special parameter package (BoS): Misha acceptance, SPP block, SAM
block, PAN aggregation block, DIoU-NMS.In the ImageNet image pool test, the
default over-constraints are as follows: preparation step-8,000,000; the number of
groups and the number of small groups-128 and 32, respectively; a method of
planning the polynomial rotation learning rate with an input learning rate of 0.1 is
obtained Method; Progress of warming-1000; According to the division of force
and mass, set to 0.9 and 0.005 respectively. All our BoS tests use the same super
limit as the default. For BoF tests, we added an additional half of the adjustment
steps. Confirm misunderstandings, CutMix, mosaic, fuzzy, information
enhancement, and named alias regularization strategies. In the BoS test, we
analyzed the impact of LReLU, Swish, and Mish on performance. When trying to
identify MS COCO objects, the following over-constraints are used by default:

configuration step-500 500; a method is used to plan the incremental learning rate,
the initial learning rate is 0.01, 400,000 and 450,000 levels are doubled and multiplied
by 0.1 times respectively; rotation force The and weight are set to 0.9 and 0.0005,
respectively. All models use a single GPU to perform multi-scale adjustments, with a
group size of 64 and a group size of 8 or 4, depending on the layout and GPU memory
limitations. From using traditional calculations to ultra-boundary search analysis, all
different checks use default settings. In traditional calculations, YOLOv3-SPP is used
to prepare for the GIoU disaster, looking for 5,000 Min-Val sets in 300 age
groups.00261, intensity 0.949, IoU limit value 0.213 for true terrain propagation, and
error normalizer 0.07 for inheritance testing. We verified a large number of BoF,
including lattice distortion, mosaic information improvement, IoU restriction, genetic
calculation, class label smoothing, cross standardization of small clusters, self
awareness preparation, cosine annealing programmer, small-scale dynamic group
size,DropBlock, Optimized Anchors, precise type of IoU misfortunes. We likewise
direct analyses on specific BoS, along with Mish, SPP, SAM, RFB, BiFPN, and
Gaussian YOLO [8]. For all examinations, we simply make use of one GPU for
getting ready, so procedures, for example, syncBN that streamlines specific GPUs
aren't applied. Impact of diverse highlights on Detector getting ready Further exam
issues the effect of diverse Bag-of Freebies (BoF-locator) . We essentially develop the
BoF listing thru thinking about diverse highlights that growth the finder exactness
with out influencing FPS: •M: Mosaic records growth - using the 4-image mosaic at
some stage in getting ready instead of unmarried image •IT: IoU edge - using severa
anchors for a solitary floor fact IoU (fact, anchor) > IoU limit •GA: Genetic
calculations - using hereditary calculations for deciding on the precise
hyperparameters at some stage in community getting ready on the principle 10% of
time spans •LS: Class mark smoothing - using elegance call smoothing for sigmoid
initiation •CBN: CmBN - using Cross small Batch Normalization for collecting
measurements in the complete institution, all matters being same of collecting
measurements inner a solitary little institution •CA: Cosine tempering scheduler -
converting the gaining knowledge of charge at some stage in sinusoid getting ready
•DM: Dynamic little institution length - programmed increment of smaller than
ordinary institution length at some stage in little purpose getting ready through using
Random getting ready shapes •OA: Optimized Anchors - using the streamlined
anchors for getting ready with the 512x512 company purpose •GIoU, CIoU, DIoU,
MSE - using different misfortune calculations for restrained container relapse.

Inference
As with preparatory work, predicting the location of the test image only requires an
assessment of the organization. In the COCO database, the network 98 predicts the
transformation rectangle of each image and the category probability of each field.
YOLO's testing speed is very fast, because unlike the classifier-based method, it only

needs to evaluate a single organization. The network configuration maintains spatial


diversity within the expectations of the bounding box. It is usually clear in which cell
the objects in the network are located, and the organization only predicts one frame
for each object. Large items or items on the edges of different slots can be restricted
to multiple slots. It is impossible to correct these detection differences with maximum
concealment. Although this is not the basis for implementation like R-CNN or DPM,
the maximum masking will not increase map by 2% to 3%.

COCO dataset
Common Object in Context (COCO) is a powerful image data set designed to identify
objects, separate them, and identify individual core issues, things, and decals. His data
set contains 330,000 images, 1.5 million object cases, and more than 200,000 labeled
objects. The articles are divided into 80 categories and provide Matlab, Python and
Lua APIs, allowing software developers to stack, analyze and comment in any
situation. Other data such as information, papers, and comments are displayed on the
COCO website. For example, the description of object recognition is as follows.
Comment{
"id":int,"image_id":
int,"category_id":in
t,
"division":RLEor[polygon],"t
erritory":coast,
"bbox":[x,y,width,height],"is
crowd":0or1,
}
Classifications[{
"id":int,
"name":str,"supercat
egory":str,
}

Comparision with other detection systems


Object recognition is the core issue in the PC concept. The recognition channel usually
starts with extracting a large number of highlights from the input image (Haar [25], SIFT
[23], Howard [4], convolution extraction). At this time, the classifier [35, 21, 13, 10] or
the locator [1, 31] is used to identify objects in the component space. These classifiers or
locators are executed on the entire image or a subset of regions in the image in a sliding
window style [34, 15, 38] We compare the YOLO recognition frame with some frames

with higher detection structures with important similarities and differences .

 And there are some other major developments techniques are as


follow:

1. Deformable parts model (DPM)


2. Deep Multi-box
3. Over Feat
4. Multi-Grasp and
5. Other fast Detectors.

 All these development techniques plays major role in the System


component development in YOLO.

IV. PERFORMANCE ANALYSIS:


We will update the image input location of YOLOv3, the video input ID, and the
permanent location of YOLOv2 and YOLOv3. Before starting, we must first load the
prepared model registration and model configuration files. Use Google Colab to do this
on an online GPU and use the following.
 YOLOv2: YOLOv2. Weight and YOLOv2.cfg.
 YOLOv3: YOLOv3. Weight and YOLOv3.cfg.

Also, we need to introduce Google collab as our IDE (coordinated improvement climate)
and import a few libraries as beneath:
• Python3.7
• Numpy
• Imutils
• OpenCV
In the table below, we show the performance of the YOLOv3 application. In table the
segmented image performance, we use red hovering to indicate the comparison between
the two models.

Model YOLOV3 SSD

Accuracy HIGH LOW

Time 0.84 0.9 ¿ / per frame) 0.17 ~ 0.23 (second / per frame)

Speed Slow Fast

Table: The comparison of one image testing between YOLOv3 and SSD

 SSD:
Another one stage approach, the first Single shot multi-Box Detector (SSD), was
published by C. Szegedy et al. in late 2016. SSD uses a VGG16 as the base
network due to its high-quality image classification, and then convolutional
feature layers that gradually reduce in size are added to its end so that it is able to
predict object at different scales by the aspect ratio. The core of SSD, by applying
small convolutional filters to feature maps, is to predict class scores and offsets for
a fixed set of the default bounding box. However, this model has been improved
and implemented by many researchers. For example, instead of taking VGG16 as
a feature extractor or object classifier for SSD, ResNet, or Mobile Net are utilized
by researchers.

Below, table shows the comparison of speed and accuracy of three versions ,

MODEL SPEED ACCURACY


YOLOv1 Low Low
YOLOv2 (YOLO9000) Fastest Medium
YOLOv3 Medium Highest

4.1- Sample inputs and oputus


V. Conclusion and future Use
In this project, we did a small-scale evaluation of the rate in speed and accuracy through
image, video, and real time instead of going through the process of map calculation due
to the time, financial limitation. In order to process a large scale of the dataset, we have to
possess a very powerful and expensive GPU to deal with it, and it still takes a great time
to complete the process. Besides, if we are based on the same dataset such as all the same
pictures and data from COCO, the result of the evaluation of performance in accuracy
will be almost the same as previous studies due to the same model trained with the same
images. These results of the map studied by researchers previously can be searched
online. Therefore, we decide to use the random images, videos and real-time camera to
measure the performance between two detectors directly. However, these data do not
have a ground truth box marked in an annotation file. That is another reason we do not
follow the abovementioned steps to compute the map. Hence, in the future we might
consider evaluates map through a dataset with its provided ground truth to verify our
output if we would like to prove that we can get resemble consequence, or further plan on
creating our own small dataset and manually draw ground truth box with label attached
for each object, so we can evaluate our own training model with the YOLO or SSD
architectures and gradually increase our data to a certain reasonable level, which might be
considered to be more objective.

YOLO brings permanent item identification to the front desk. It should be noted that
object recognition is not universal in many areas because it is very useful because
multifunctional robots and usually independent machines are increasingly being
transported (such as quadcopters, drones, and soon auxiliary equipment). )... Robots) need
to become more and more urgent for ID frames. I believe that we need an object
positioning framework for nanorobots or exploring areas that humans have never seen
before (such as the deep sea or other planets), and the recognition framework must
discover how to experience new object categories. In this case, you need to keep learning
in the open world.

VI. REFERENCES
1 J. Redmon, S. Divvala, R. Girshick, A. Farhadi, "You only look once:
Unified, real-time object detection". Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, 2016.
2 J. Redmon and A. Farhadi, "YOLO9000: better, faster, stronger".
Proceedings of the IEEE conference on computer vision and pattern
recognition, arXiv:1612.08242 [cs.CV], 2017.
3 J. Redmon and A. Farhadi, "Yolov3: An incremental improvement".
Preprint arXiv:1804.02767 [cs.CV], 2018.
4 W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C.
Berg, “Ssd: Single shot multibox detector,” in European Conference on
Computer Vision (ECCV), 2016, pp. 21–37.
5 Pulli, Kari; Baksheev, Anatoly; Kornyakov, Kirill; Eruhimov, Victor (1
April 2012). "Realtime Computer Vision with OpenCV". Queue: 40:40–
40:56. doi:10.1145/2181796.2206309 (inactive 2019 07-14).
6 Object detection, Object detection - Wikipedia.
7 Joseph Chet Redmon’s personal websit, Survival Strategies for the Robot
Rebellion

You might also like