0% found this document useful (0 votes)
106 views77 pages

AI Video Analytics for Surveillance Robots

Uploaded by

waelos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
106 views77 pages

AI Video Analytics for Surveillance Robots

Uploaded by

waelos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Abstract

In our graduation internship, we focus on the challenge of the deployment of AI-


powered video analytics into video management software and the intelligent detection
by robots for surveillance applications. We intend in this project to make use of deep
learning in order to obtain good results for the detection task. Using video inputs from
visible cameras the process is incorporated into YOLO-v3 tiny architecture. In addition,
the main goal is the integration of such models into video management software in
order to use it in the industry which is a challenging task. To show the effectiveness of
our proposed model, the detection process is applied in real-time on the PGuard robot
manufactured by Enova ROBOTICS, the hosting company of this project. In total, this
work can be summarized in two steps. The first step is to develop a people detection
model, then the second step is to integrate the object detection model in PGuard Robot
by building a pipeline for the deployment phase and then developing a plugin in the
video management software that will be able to trigger alarms based on the detection
in that area. Keywords: Computer Vision, Deep Learning, object detection, YOLO,
visible and thermal cameras, deployment, intelligent video analytics, video processing
service.

Résumé
Dans notre stage de fin d’études, nous nous concentrons sur le défi du déploiement
de l’analyse vidéo alimentée par l’IA dans les logiciels de gestion vidéo et la détection
intelligente par des robots pour les applications de surveillance. Nous avons l’intention
dans ce projet d’utiliser de tels réseaux afin d’obtenir de bons résultats pour la tâche
de détection. En utilisant les entrées vidéo des caméras visibles, le processus est intégré
à la petite architecture YOLO-v3. De plus, l’objectif principal est l’intégration de
tels modèles dans un logiciel de gestion vidéo afin de l’utiliser dans l’industrie, ce qui
est une tâche difficile. Pour montrer l’efficacité de notre modèle proposé, le proces-
sus de détection est appliqué en temps réel sur le robot PGuard fabriqué par Enova
ROBOTICS, la société hôte de ce projet. Au total, ce travail peut se résumer en deux
étapes. La première étape consiste à développer un modèle de détection de personnes,
puis la deuxième étape consiste à intégrer le modèle de détection d’objets dans PGuard
Robot en construisant un pipeline pour la phase de déploiement puis en développant
un plugin dans le logiciel de gestion vidéo qui pourra déclencher des alarmes sur la
base de la détection dans cette zone. Mots clés: Vision par ordinateur, apprentissage
en profondeur, détection d’objets, YOLO, caméras visibles et thermiques, déploiement,
analyse vidéo intelligente, Service de traitement vidéo.
Acknowledgement

The internship opportunity that I had with Enova Robotics was a great chance for
learning and professional development. I am grateful for having a chance to meet so
many wonderful people and professionals who led me through this internship period.
I would like to express my gratefulness to my supervisors for their continuous support
and encouragement throughout my graduation project. This project would not have
been possible without their assistance. I want to use this opportunity to express my
deepest gratitude and special thanks to Mr.Amir Ismail my professional supervisor
who guided me throughout this project, for his time to hear, guide and thus making
me feel confident. It has been invaluable in helping me to develop both personally and
professionally. It is my radiant sentiment to place on record deepest sense of gratitude
to my university supervisor Mr.Sami Achour for the help and advice concerning the
missions mentioned in this report, which he gave me during the various follow-ups and
for the time he devoted to us throughout this period without forgetting his participa-
tion in the development of this report.

Lastly, my family deserves endless gratitude. I express my deepest thanks to my


friends as well for giving me necessary advices and guidance. I choose this moment to
acknowledge their contribution gratefully. I perceive this opportunity as a big milestone
in my career development. I will strive to use gained skills and knowledge in the best
possible way, and I will continue to work on their improvement, in order to attain
desired career and life goals.
Contents

Contents i

List of Figures iv

List of Tables vi

General Introduction 1

1 General presentation 3
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 General context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Presentation of the host organization . . . . . . . . . . . . . . . . . . . 3
1.2.1 Enova Robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Sectors of activity . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Project context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Definitions and concepts . . . . . . . . . . . . . . . . . . . . . . 7
1.3.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.3 Existing solutions . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.3.1 VEER . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.3.2 Openpath . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.3.3 BriefCam . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.4 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Project management methodology . . . . . . . . . . . . . . . . . . . . . 12
1.4.1 Unified Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.2 Two Tracks Unified Process . . . . . . . . . . . . . . . . . . . . 12
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Theoretical study 15
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1 Deep learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.1 Neural network . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

i
CONTENTS

2.1.2 Activation functions . . . . . . . . . . . . . . . . . . . . . . . . 16


2.1.3 Cost function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.4 Neural networks learning . . . . . . . . . . . . . . . . . . . . . . 17
2.1.5 Optimizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Computer vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.1 Object detection . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.2 Convolutional Neural Network . . . . . . . . . . . . . . . . . . . 19
2.2.3 Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 The state-of-the-art of object detection . . . . . . . . . . . . . . . . . . 20
2.3.1 Two-stage methods . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.1.1 R-CNN . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.1.2 Fast R-CNN . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.1.3 Faster R-CNN . . . . . . . . . . . . . . . . . . . . . . 23
2.3.2 One-stage methods . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.2.1 Single Shot Multi-Box Detector (SSD) . . . . . . . . . 24
2.3.2.2 You Only Look Once(YOLO) . . . . . . . . . . . . . . 25
2.3.2.3 YOLOv3-tiny architecture . . . . . . . . . . . . . . . . 28
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3 Preliminary and conceptual Study 30


Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1 Functional branch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.1 Actors identification . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.2 Functional needs . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.3 Use cases diagram . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Technical branch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.1 Technical needs and criteria . . . . . . . . . . . . . . . . . . . . 33
3.2.2 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Proposed solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4 Conceptual study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4.1 Deployment diagram . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4.2 Activity diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4.3 Sequence diagrams . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.4.3.1 Sequence diagram of ”Authenticate” . . . . . . . . . . 42
3.4.3.2 Sequence diagram of ”Start auto monitoring” . . . . . 42
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

ii
CONTENTS

4 Realization 44
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.1 People detection model . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.1.1 Development environment . . . . . . . . . . . . . . . . . . . . . 44
4.1.2 Data source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.2.1 COCO dataset . . . . . . . . . . . . . . . . . . . . . . 45
4.1.2.2 Data preparation . . . . . . . . . . . . . . . . . . . . . 46
4.1.3 Detection phase . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1.3.1 Implementation and configuration details . . . . . . . 47
4.1.3.2 Training and testing . . . . . . . . . . . . . . . . . . . 48
4.2 Deployment phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2.1 Hardware tools . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2.2 Software tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2.2.1 JetPack SDK . . . . . . . . . . . . . . . . . . . . . . . 50
4.2.2.2 DeepStream SDK . . . . . . . . . . . . . . . . . . . . . 51
4.2.2.3 TensorRT . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.3 VPS Toolkit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2.4 Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3 Integration into Milestone XProtect . . . . . . . . . . . . . . . . . . . 55
4.3.1 General presentation of MIP SDK . . . . . . . . . . . . . . . . . 55
4.3.2 Development environment . . . . . . . . . . . . . . . . . . . . . 56
4.3.2.1 Hardware environment . . . . . . . . . . . . . . . . . . 56
4.3.2.2 Software environment and technologies . . . . . . . . . 57
4.3.3 VMS Plugin development . . . . . . . . . . . . . . . . . . . . . 59
4.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4.1 Authentication interface . . . . . . . . . . . . . . . . . . . . . . 60
4.4.2 Milestone XProtect Smart Client home interface . . . . . . . . . 60
4.4.3 Plugin interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4.4 Alarm manager interface . . . . . . . . . . . . . . . . . . . . . . 62
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

General conclusion 64

iii
List of Figures

1.1 Enova Robotics Logo[1] . . . . . . . . . . . . . . . . . . . . . . . . . . 4


1.2 PGuard Robot [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Mini-Lab robot [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Ogy robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Covea robot [4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 AGV robot [5] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.7 VEER plugin [7] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.8 2TUP methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.1 Neural network formula [12] . . . . . . . . . . . . . . . . . . . . . . . . 15


2.2 Neural network layers [13] . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Activation function samples for hidden layers [14] . . . . . . . . . . . . 16
2.4 Softmax activation function [15] . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Gradient descent [16] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6 CNN architecture[18] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.7 R-CNN architecture [19] . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.8 Comparaison between R-CNN and Fast R-CNN [20] . . . . . . . . . . . 23
2.9 Faster R-CNN architecture [21] . . . . . . . . . . . . . . . . . . . . . . 23
2.10 Comparaison in test time speed [22] . . . . . . . . . . . . . . . . . . . . 24
2.11 Framework of the SSD[23] . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.12 YOLO algorithm steps [24] . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.13 Illustration of You Only Look Once (YOLO) . . . . . . . . . . . . . . . 27
2.14 The network structure of Tiny-YOLO-V3. [1] . . . . . . . . . . . . . . 29

3.1 Global use cases diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 32


3.2 Hikvision logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3 Genetec Security Center logo . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 Milestone XProtect logo . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.5 IVA and video management components . . . . . . . . . . . . . . . . . 37
3.6 Solution architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.7 Deployment diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

iv
LIST OF FIGURES

3.8 Activity diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41


3.9 Sequence diagram of ”Authenticate” . . . . . . . . . . . . . . . . . . . 42
3.10 Sequence diagram of ”Start auto monitoring” . . . . . . . . . . . . . . 43

4.1 Google Colaboratory LOGO . . . . . . . . . . . . . . . . . . . . . . . . 45


4.2 annotations file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3 obj.data file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.4 training command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.5 testing command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.6 NVIDIA Jetson Nano [33] . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.7 NVIDIA JetPack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.8 NVIDIA DeepStream . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.9 DeepStream SDK [35] . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.10 Architecture of the NVIDIA DeepStream reference application [42] . . . 52
4.11 NVIDIA TensorRT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.12 VPS overview [30] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.13 make command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.14 deepstream command . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.15 running VPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.16 MIP SDK architecture [29] . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.17 ConceptD 500 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.18 Milestone XProtect VMS [32] . . . . . . . . . . . . . . . . . . . . . . . 57
4.19 Visual Studio logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.20 StarUML logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.21 .Net framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.22 WPF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.23 Login interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.24 Home interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.25 Menu in Smart Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.26 Plugin interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.27 Alarm manager interface . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.28 Detection on thermal camera . . . . . . . . . . . . . . . . . . . . . . . 63

v
List of Tables

2.1 Comparison of different methods in FPS and mAP [25] . . . . . . . . . 28

4.1 The configurations in cfg file . . . . . . . . . . . . . . . . . . . . . . . . 48

vi
General Introduction

Within the next ten years, robotics will dominate every aspect of life. This technol-
ogy has the potential to change the way of life and work and to raise standards. Over
time, its influence will become greater and greater, and the interaction between robots
and humans will also become greater. Between the 1960s and 1990s, most robots and
robotic systems were generally limited to industrial applications. It will have a huge
impact on many sectors such as military industry, health, customer service, transport
and logistics.

Robotics and artificial intelligence are increasingly being used in security, as they are
able to provide a high level of security while also reducing costs. For example, robots can
be used to patrol and monitor area, while artificial intelligence can be used to analyze
raw data from security cameras and to identify potential threats. Therefore, artificial
intelligence is being used to create more sophisticated and effective surveillance systems.

Despite the promise of machine learning and artificial intelligence, more than 80%
of data science projects never made it to production. It’s not just having the right
machine learning models and services that allows us to do Machine Learning at scale
the way we want to; it’s being able to put them in the right secure, operationally per-
formant, fully featured, cost-effective system with the right access controls that allows
us to achieve the business results we want.

It is in this context that our end-of-studies project hosted by the company Enova
Robotics fits, which aims to set up a solution for integrating AI-powered video analytics
applications into video management system to improve security, efficiency and to reduce
cost. This solution is specifically targeted to the Pearl Guard robot which is a security
robot. It is mobile, intelligent and autonomous. Its main mission is to secure industrial
sites and detect intrusions. The Pearl Guard can track the intruder by transmitting
the precise location of the intruder in real-time along with video stream and heat stream.

This report is divided into four chapters. The first chapter presents the general

1
General Introduction

context of the project, in which we focus on the work to be carried out and the study
of the related works. The second chapter presents the theoritical study which is an
introduction to deep learning and computer vision also presenting the state-of-the-art of
object detection. The third chapter reveals the preliminary requirements of the study to
answer the problem of the project as well as the conceptual study by providing diagrams
that can better illustrate the proposed solution. The last chapter exposes the various
human-machine interfaces developed as well as some details of the implementation and
results of our solution. We end this report with a conclusion and perspectives.

2
Chapter 1

General presentation

Introduction
In the first chapter, we start with a brief overview of the host organization. Then,
we highlight the context and issues of the project and describe its objectives. Finally,
we define the management methodology we used.

1.1 General context


The topic, covered in this report, is titled deployment of AI powered video analytics
into XProtect using NVIDIA DeepStream SDK and Milestone VPS toolkit. It was
developed as part of a graduation internship within Enova Robotics company in order
to complete our academic training.

1.2 Presentation of the host organization


1.2.1 Enova Robotics
Enova Robotics is an innovative company, created in May 2014 and specialized in the
development, production and marketing of autonomous mobile robots. Enova Robotics
is a pioneer in this field in Africa and the Arab world. The company develops and
markets its own mobile solutions that meet needs in various sectors, health, security
and surveillance and marketing.

3
CHAPTER 1. GENERAL PRESENTATION

Figure 1.1: Enova Robotics Logo[1]

1.2.2 Sectors of activity


Since its creation, Enova Robotics has been committed to solutions dedicated to the
development of mobile robots as well as research and development projects in robotics
in various fields such as security, health, logistics, marketing, education and industrial.
Below are some examples of these robots.
• Pearl Guard (PGuard):
The robot represented in the figure 1.2 is an autonomous mobile robot capable of
navigating uninterrupted for more than 8 hours. Its state-of-the-art equipment,
its thermal camera, its on-board computer and its remote communication kit allow
it to patrol, detect intrusions in complete autonomy and allows the user to carry
out remote interventions. Pearl Guard can track an intruder by transmitting in
real time its precise location as well as a video stream and heat stream. Pearl
Guard is efficient, robust and autonomous, ensuring optimal security at deploy-
ment sites. Indeed, this robot made a lot of noise nationally and internationally
during the COVID 19 crisis where it was deployed to patrol the neighborhoods
of the Tunisian capital to ensure that people observe the required confinement by
the authorities, as part of the fight against the coronavirus.

Figure 1.2: PGuard Robot [2]

4
CHAPTER 1. GENERAL PRESENTATION

• Mini-Lab:
Mini-Lab as shown in the figure 1.3 is a robot designed by teachers for teachers.
It is a medium-sized robot optimized for indoor applications. It was born after an
experience of more than a decade in teaching and research in the field of mobile
robotics. Mini-Lab offers the best compromise between robustness and economic
competitiveness. For a complete teaching experience, Mini-Lab comes with pre-
defined labs and can be simulated on Matlab and Gazebo. Its control architecture
is open-source and is based on the Robot Operating System (ROS).

Figure 1.3: Mini-Lab robot [3]

• Ogy:
Ogy as shown in the figure 1.4 is a home security robot developed by Enova
Robtics in partnership with other Tunisian companies: Chifco and OPCMA. It
is a companion robot that protects and monitors the home and has the ability to
communicate with connected objects while acting in real time.

Figure 1.4: Ogy robot

5
CHAPTER 1. GENERAL PRESENTATION

• Covea:
Covea as represented in the figure 1.5 is a telepresence robot designed to help el-
derly people at home. It provides continuous monitoring and tele-vigilance. It is
therefore a way to facilitate follow-up for doctors and allow them to act with peo-
ple who are far away. This model has been deployed in one of the main Tunisian
hospitals that cares for patients with COVID-19, in order to limit contact be-
tween caregivers and patients and to improve remote and synchronous exchanges
between patients and their families.

Figure 1.5: Covea robot [4]

• AGV:
AGV as shown in the figure 1.6 is an autonomous mobile cart designed to trans-
port payload from warehouses to production lines in a wide range of industries.
The AGV is used by manufacturers to automate their internal transport and lo-
gistics. Depending on the customer’s business demand, custom top modules can
be mounted on the AGV to support different types of payloads.

6
CHAPTER 1. GENERAL PRESENTATION

Figure 1.6: AGV robot [5]

1.3 Project context


After the description of the general context of the project and the presentation of
the host organization, we outline in this part our problem statement as well as our
objectives. Also, we present the existing solutions.

1.3.1 Definitions and concepts


First of all, we define the different terms and concepts on which our project is based
to better understand the technologies used. We start by defining the keywords involved
in our solution.

• Video Management System(VMS): , also known as video management soft-


ware plus a video management server. It is a component of a surveillance system
that its main missions are to collect videos from cameras and other sources, store
these videos on a storage medium and easily access these recordings. Some of
the most recommended video management systems include Genetec, Milestone,
Verint, and Nice.

• Video Processing Service (VPS): . The video processing is an example of


signal processing, in particular image processing, which often employs video filters
and where the input and output signals are video files or video streams. This
service will enable us to analyze the video streams issued from the cameras and
apply deep learning models on it.

• MLOps: (a combination of ”machine learning” and ”operations”) is a new engi-


neering approach for data scientists and engineers to collaborate and communicate
in order to manage the production lifecycle of machine learning systems. MLOps,

7
CHAPTER 1. GENERAL PRESENTATION

like DevOps, has two key responsibilities:


1. bringing machine learning applications to production fast and reliably.
2. ensuring that machine learning applications are operational 24 hours a day,
seven days a week, while meeting all functional and nonfunctional requirements.

1.3.2 Problem statement


AI-based intelligent video analytics are deployed across a wide range of industries
to increase operational efficiency and safety (intrusion detection, parking management,
site inspection, etc.). Since the PGuard robot is patrolling high-risk and limited-access
environments (Nuclear plants, Military areas, Airports, etc.), it is mandatory to effi-
ciently analyze the video streams in order to detect unusual and strange behaviors.
To solve this issue, an efficient AI-based video analytic application is highly required.
This application needs to be integrated with a Video Management Software (VMS) in
an end-to-end workflow.
By using deep learning, video analytics can be applied to automatically identify and
track people and objects in a video feed. This can be deployed to improve security by
providing real-time alerts when someone or something enters a restricted area. In addi-
tion, detecting in real-time objects that are moving and then sending alarms to security
agents based on a specific event is crucial for video management systems, among other
research areas.
In the traditional surveillance systems, the cameras are monitored by humans. So, the
effectiveness of the surveillance system depends on the careful observation by the hu-
man which can be a difficult task because of the numerous cameras or the huge traffic.
Humans can easily recognize different types of objects in videos or images. However,
algorithms and computer programs are highly dependent on the types of data. Certain
challenges, such as weather or light play an important role in making the process easy
or difficult. Also, the learning phase in deep learning depends on the number of sam-
ples. Larger the number is, more accurate is the performance. Added to that, there
are different types and sizes of objects. Thus, the chief challenge is to identify moving
objects in their different forms by the robot. In fact, the PGuard robot is responsible
for monitoring an area and triggering alarms when something suspected or dangerous
occurs.

1.3.3 Existing solutions


The study of the existing is an essential step in any project start-up. Indeed, this
study allows us to determine the strengths and weaknesses of current systems in order
to avoid its shortcomings, to stimulate critical thinking and to promote creativity.

8
CHAPTER 1. GENERAL PRESENTATION

The search for the products to be analyzed is based on the intelligent video analytics
platforms integrated with the video management system. Then, we chose to treat the
following systems:

1.3.3.1 VEER

A cutting-edge video analytics platform coupled with industry-leading video man-


agement software (Milestone XProtect) for IP network-based video surveillance, making
cities and companies safer and smarter.
VEER delivers security solutions by converting raw video footage into actionable in-
telligence using AI-powered technology for different customers such as government and
traffic monitoring [6]. In fact, governments require the latest technology to provide
security for nations. Artificial intelligence combined with security cameras monitors all
movements to provide an additional layer of security. Basically with just one security
camera, it makes your streets safer for traffic. Cities may not sleep, but people do.
Below in the figure 1.7 is the interface of the VEER plugin integrated with the Video
Management Software Milestone. As shown in the picture, there are many filters you
can apply on the video streams such as person, car, truck,etc.

Figure 1.7: VEER plugin [7]

• Strengths:
VEER offers a number of features that make it a valuable tool for businesses, in-
cluding its ability to segment customers, its ability to create and manage customer
profiles, and its ability to integrate with other business applications. Among the
features and services, we cite gun detection and alert, people counting, speed
detection for traffic monitoring, etc.

9
CHAPTER 1. GENERAL PRESENTATION

• Weaknesses:
There are several potential weaknesses of the VEER solution, which include:
1. Lack of flexibility: The VEER solution is designed to be used in a specific way,
and it may not be possible to adapt it to different situations or needs.
2. Limited scalability: The solution may not be able to scale up to meet the needs
of a larger organization.
3. Dependence on technology: The solution is reliant on technology, and if there
are technical problems, the solution may not be able to function properly.
4. Lack of customization: The solution may not be able to be customized to meet
the specific needs of an organization.

1.3.3.2 Openpath

Openpath can be integrated with multiple leading video security solutions for greater
visibility and powerful, real-time analytics [8].
For more efficient risk reduction, video management software (VMS) combined with
physical access control creates a unified solution with best-in-class technology partners.
Integrating Openpath and remote video management software is a smarter approach to
IP surveillance because it provides visual context for all access control events.

• Strengths:
There are numerous benefits of an integrated video surveillance and access control
system.
1. Improved safety: Reduce loss-prevention issues by visualizing credentialed
users and controlling zone capacity with video occupancy tracking features.
2. Remote monitoring and management: Visually confirm or query entrances
from anywhere using a phone or tablet and no additional servers. With remote
lock and unlock capabilities, you can solve problems and manage access events.
3. Intelligent response: Context-driven security benefits from real-time alerts and
reporting, which leads to better decision-making and issue response.
4. Reduced administrative and IT hassle: Provisioning and installation are simple,
with no drain on IT resources.

• Weaknesses:
1. Potential security risks: The solution may pose security risks, as it stores
sensitive information in a cloud based service.
2. Lack of customization: The Openpath solution may not be adaptable to an
organization’s specific requirements.

10
CHAPTER 1. GENERAL PRESENTATION

1.3.3.3 BriefCam

Briefcam is a video content analytics platform that enables users to quickly and
easily identify, review and analyze video footage [9]. The platform provides a range of
features and tools that make it easy to search and review video footage, as well as to
identify and track objects and people in video footage.

• Strengths:
1. Faster filtering and sorting experience
2. Security: Customers can define their own database credentials.
3. Multilangual: Adding multiple languages to the user interface.
4. Easy to use: To help you better navigate the BriefCam system, they added
some interactive exercices to the user training in their portal.

• Weaknesses:
There are several weaknesses of the BriefCam video content analytics platform
that should be considered before using it.
1. The platform is not able to process video in real-time, which can be a problem
if you need to analyze footage as it happens.
2. The platform is not very accurate when it comes to identifying objects and
people in video footage, which can lead to false positives or false negatives when
trying to detect specific activity.
3. The BriefCam video content analytics platform is quite expensive, which may
make it unaffordable for some users.

1.3.4 Objective
The objective of this internship is to build a solution for integrating and deploying
deep learning models with a video management system that enables us to receive video
streams from different cameras of the robot, to be then forwarded to the intelligent video
analytics that will be responsible for the video processing part. The video processing
service should be able for example to detect specific objects such as persons, cars, etc
from different cameras using deep learning models. Then, it will return metadata to
the video management software to be read and analyzed. In proposed solution, we will
focus on object detection to test our pipeline. In fact, the aim of object detection is to
analyze the video sequences and trigger alarms to the user when something occurs. In
this manner, a bounding box should be drawn around each detected object all along its
presence in the video with a class name above it. We will focus on people detection in
our solution as an example of object detection. The alarm will be shown in the smart

11
CHAPTER 1. GENERAL PRESENTATION

client interface and the recorded videos will be displayed and saved. This could help us
determine the state of the monitored area.

1.4 Project management methodology


Project management is the discipline of carefully projecting or planning, organizing,
motivating, and directing resources in order to satisfy particular goals and success
criteria. As a result, picking the appropriate approach will have an impact on the
entire workflow as well as team communication. For various project types, different
project management approaches have their own set of benefits and drawbacks.
In this section we will present the Two Tracks Unified Process (2TUP) development
methodology that we adapted throughout our project for several reasons that we will
discuss further below.

1.4.1 Unified Process


The unified process [10] is a software development process that groups the activities
to be performed to transform the user’s requirements into a software system. The basic
characteristics of the unified process are as follows:

• Iterative and incremental:


The UP is an iterative and incremental development process. Through iteration,
errors and misunderstandings can be found earlier.

• Architecture-centric:
Different views are used to describe the system architecture. The architect pro-
ceeds in stages, starting by defining a simplified architecture that meets the prior-
ity requirements, then defining the subsystems more precisely from the simplified
architecture found earlier.

• Risk-focused:
Identify risks and maintain a list of risks throughout the project.

1.4.2 Two Tracks Unified Process


The 2TUP is a software development process that implements the Unified Process
method. It offers a Y-shaped development cycle, which separates the technical aspects
from the functional aspects [11].
It begins with a preliminary study which essentially consists of identifying the actors
who will interact with the system to be built, the messages that the actors and the

12
CHAPTER 1. GENERAL PRESENTATION

system exchange, producing the specifications and modeling the context (the system is
a box black, the actors surround him and are connected to him, on the axis that binds
an actor to the system we put the messages that the two exchange with meaning).
The process then revolves around three essential phases :

• A technical branch:
The technical branch capitalizes on technical know-how and/or technical con-
straints. The techniques developed for the system are independent of the functions
to be performed.

• A functional branch:
The functional branch capitalizes on the knowledge of the business of the company.
This branch captures functional needs, which produces a model focused on the
business of end users.

• An implementation phase:
The implementation phase consists of bringing the two branches together, allowing
application design to be carried out and finally the delivery of a solution adapted
to the needs.

13
CHAPTER 1. GENERAL PRESENTATION

Figure 1.8: 2TUP methodology

Conclusion
This chapter is the corner stone of our project in which we have defined the scope
of our study followed by the problem statement to specify our objectives. Indeed, the
issues and problem faced enabled us to prepare a good design for the improvements
that we will add to the solution offered to meet our needs. In the following chapter
we will present the theoritical study which is represented in an introduction to deep
learning and computer vision as well as the state-of-the-art of object detection.

14
Chapter 2

Theoretical study

Introduction
The objective of this chapter is to define and introduce the field of deep learning
and then present the state-of-the-art of object detection. We have divided this chapter
into three sections. In section one, we introduce the field of deep learning, where we
describe its general concept and how neural networks learn. Section two refers to the
computer vision part in which we define the concept of object detection. Finally, in
the last section, we present the state of the art of object detection and the different
architectures.

2.1 Deep learning


Deep Learning is a sub-field of machine learning, and neural networks make up the
backbone of deep learning algorithms. It concerned with algorithms inspired by the
structure and function of the brain called artificial neural networks.

2.1.1 Neural network


Neural networks and more specifically, artificial neural networks (ANNs)—mimic the
human brain through a set of algorithms. At a basic level, a neural network is comprised
of four main components: inputs, weights, a bias or threshold, and an output. Similar
to linear regression, the algebraic formula would look something like this:

Figure 2.1: Neural network formula [12]

15
CHAPTER 2. THEORETICAL STUDY

A network may have three types of layers: input layers that take raw input from the
domain, hidden layers that take input from another layer and pass output to another
layer, and output layers that make a prediction as shown in the figure 2.2.

Figure 2.2: Neural network layers [13]

2.1.2 Activation functions


An activation function in a neural network specifies how the weighted sum of the
input is transformed into an output from a layer’s node or nodes. Each neuron is lit up
when its activation has a high number or grayscale.

• Activation for hidden layers: In the hidden layers of a neural network, a dif-
ferentiable nonlinear activation function is typically used. As a result, the model
can learn more complex functions than a network trained with a linear activation
function. The picture 2.3 presents the Sigmoid, TanH and ReLu functions.

Figure 2.3: Activation function samples for hidden layers [14]

• Activation for output layers: There are three activation functions you may
want to consider for use in the output layer:

– Linear

16
CHAPTER 2. THEORETICAL STUDY

– Logistic (Sigmoid)
– Softmax

The figure 2.4 represents the softmax activation function. Basically, The function
is useful and great for classification problems, especially multi-class classification
problems, because it returns the ”confidence score” for each class. Because we’re
dealing with probabilities, the softmax function’s results will add up to 1.

Figure 2.4: Softmax activation function [15]

2.1.3 Cost function


A cost function is the error representation in machine learning-based approaches. It
shows how our model is behaving compared to the ground-truth values or actual output
labels. The cost function is the average of the loss functions of the entire training set,
whereas the loss function computes the error for a single training example. It’s a metric
for ”how well” a neural network performed in relation to a given training sample and
expected output. Because it rates how well the neural network performed as a whole,
it is a single value rather than a vector.

2.1.4 Neural networks learning


Learning means finding the right weights and biases which reduces the value of the
cost function. When considering the input of this cost function as a single input (w) and
trying to find the minimum of this function. It is possible to find it explicitly but that’s
not always feasible for really complicated functions such as ours because we could have
multiple local minimum. In fact, a gradient of a function gives you the direction of the
steepest ascent so taking the negative of that gradient gives you the direction to step
that decreases the function most quickly. The algorithm for minimizing the function
is to compute this gradient direction then take a small step downhill and repeat that
over and over. The cost funtion involves an average over all of the training data so if

17
CHAPTER 2. THEORETICAL STUDY

you minimize it, it means it’s better performance on all of those samples.
Backpropagation is the algorithm for efficiently computing this gradient, which is the
heart of how a neural network learns.
The majority of deep neural networks are feed-forward, which means they only flow
from input to output in one direction. Backpropagation, on the other hand, allows you
to train your model in the opposite direction.It allows us to calculate and attribute each
neuron’s error, allowing us to fine-tune and fit the algorithm appropriately.

“Neurons that fire together wire together”[46]

2.1.5 Optimizers
Optimizers are algorithms used to minimize an error function (loss function) or to
maximize the efficiency of production. Optimizers are mathematical functions that are
dependent on the model’s learnable parameters including weights and biases. They help
us know how to change the weights and learning rate of a neural network to reduce the
losses. We are going to introduce two types of optimizers.

• Gradient descent: Gradient descent is based on a convex function and tweaks


its parameters iteratively to minimize a given function to its local minimum as
shown in the figure 2.5. Gradient Descent iteratively reduces a loss function by
moving in the direction opposite to that of the steepest ascent. It is dependent
on the derivatives of the loss function for finding minima. It is easy to implement
and easy to understand. [2]

Figure 2.5: Gradient descent [16]

• ADAM (Adaptive Moment Estimation) : It’s a technique for determining


adaptive learning rates for each parameter. Similar to momentum, it stores both
the decaying average of past gradients and the decaying average of past squared
gradients, similar to RMS-Prop and Adadelta [17]. As a result, it combines the
benefits of both methods. It is slow initially but pick up speed over time.

18
CHAPTER 2. THEORETICAL STUDY

The question is how to choose optimizers. In fact, if the data is sparse, use the self-
applicable methods, namely Adagrad, Adadelta, RMSprop, Adam. In many cases,
RMSprop, Adadelta, and Adam have similar effects. While Adam just added bias-
correction and momentum on the basis of RMSprop. Finally, As the gradient becomes
sparse, Adam will perform better than RMSprop.

2.2 Computer vision


Computer vision designates an artificial intelligence technique making it possible to
analyze images captured by equipment such as a camera. To many, computer vision is
the AI equivalent of human eyes and the ability of our brains to process and analyze
visual information.

2.2.1 Object detection


Object detection or recognition is a computer vision task that combines both clas-
sification and localization. In fact, image classification involves predicting the class of
one object in an image, while object localization refers to identifying the location of
one or more objects in an image and drawing a bounding box around them.

2.2.2 Convolutional Neural Network


When talking about object detection we can’t not talk about convolutional neural
network (CNNs). CNNs have been extensively used to classify images because of their
high accuracy. The CNN uses a hierarchical model that builds a network in the shape
of a funnel and then outputs a fully connected layer in which all neurons are connected
to one another and the output is processed. In fact, the CNN architecture has 2 parts
as shown in the figure 2.6 :

• In a process known as Feature Extraction, a convolution tool separates and iden-


tifies the various features of an image for analysis.

• A fully connected layer that uses the output of the convolution process to predict
the image’s class using the features extracted in previous stages.

19
CHAPTER 2. THEORETICAL STUDY

Figure 2.6: CNN architecture[18]

2.2.3 Deployment
Deep learning models are able to learn complex patterns in data and make pre-
dictions about new data. Deploying deep learning models can help organizations to
automate tasks, improve decision making, and gain insights from data. It’s one of the
last steps in the machine learning process, and it’s also one of the most cumbersome
and time-consuming phase.
There are many ways to deploy deep learning models. Some popular methods are:

1. Using a cloud service: this is a popular method for deploying deep learning
models. The model is stored on a cloud service such as Amazon AWS or Google Cloud
Platform. Once the model is deployed, you will need to make sure it is accessible to your
users. This can be done by creating an API or by using a tool such as AWS Lambda
and finally it needs to be monitored to make sure it is working correctly. There are a
few reasons why it is not recommended to deploy your AI model on the cloud. First, it
can be expensive to be hosted. Second, it can be difficult to manage and monitor your
model on the cloud. Finally, there is a risk that your model may be compromised.
2. On the edge: edge AI refers to AI algorithms that are run locally on a device
such as an edge node. Many applications require the deployment of deep learning at the
edge for real-time inference and privacy issues. In terms of network bandwidth, network
latency, and power consumption, it significantly reduces the cost of communicating with
the cloud. Edge devices, on the other hand, have limited memory, computing resources,
and processing power. This requires the optimization of a deep learning network for
embedded deployment.

2.3 The state-of-the-art of object detection


In this part, we present the state of the art of object detection. Since there are two
categories of object detection, We have divided this state-of-the-art into two sections.

20
CHAPTER 2. THEORETICAL STUDY

One-stage methods prioritize inference speed, and example models include YOLO, SSD,
and RetinaNet. Two-stage methods prioritize detection accuracy, and example models
include R-CNN, Fast R-CNN, Faster R-CNN, and Cascade R-CNN. Section one refers
to two-stage methods while section two presents one-stage methods.

2.3.1 Two-stage methods


A RoI (Region of Interest) Pooling layer can separate the two stages of a two-stage
detector. One model is used to extract regions of objects, and a second model is used
to classify and refine the object’s localization.

2.3.1.1 R-CNN

The R-CNN algorithm was the first model to apply deep learning to the task of ob-
ject detection. Region-based Convolutional Neural Network (R-CNN). This algorithm
was published in 2014. The R-CNN series are based on the concept of region proposals.
They are used to localize the object in an image. As described in the figure 2.7, the
overall pipeline is composed of three stages:

• Generate region proposals: the model must draw candidates of objects in the
image, independent from the category using a selective search algorithm. The
proposal regions are cropped and resized from the image.

• The second stage is a fully convolutional neural network that computes features
from each cropped and resized region.

• The final stage is a fully connected layer, expressed as SVMs that will refine the
region proposal bounding boxes.

Figure 2.7: R-CNN architecture [19]

The selective search algorithm is used to generate region proposals. To avoid the prob-
lem of selecting a large number of regions, Ross Girshick et al. proposed a method in

21
CHAPTER 2. THEORETICAL STUDY

which we use selective search to extract only 2000 regions from the image, which he
referred to as region proposals [3]. Selective Search is composed of 3 steps:

• Generate a large number of candidate regions for the initial sub-segmentation.

• Use greedy algorithm to recursively combine similar regions into larger ones.

• Create the final candidate region proposals using the generated regions.

The problems presented in R-CNN consist of taking a huge amount of time to train
the network as you would have to classify 2000 region proposals per image. Second,
it cannot be implemented in real-time because each test image takes approximately 47
seconds to process, and finally, the selective search algorithm is a fixed algorithm. As
a result, no learning is happening at that stage. This may result in the generation of
poor candidate region proposals.

2.3.1.2 Fast R-CNN

Many variants of R-CNN followed, such as Fast-R-CNN, Faster-R-CNN, and Mask-


R-CNN, which improved the task of object detection. The same author of the previous
R-CNN paper addressed some of the shortcomings of R-CNN in order to create a faster
object detection algorithm, which was named Fast R-CNN.
In a manner similar to the R-CNN, we directly feed the initial input image to the CNN
to generate a feature map rather than passing the region proposals to the CNN. As a
result, we determine the region of proposals from the convolutional feature map, warp
them into squares, and reshape them into a fixed size using a RoI pooling layer so that
it can be sent to a fully connected layer. The class of the proposed region and the
offset values for the bounding box are predicted using a softmax layer based on the RoI
feature vector. So, with Fast R-CNN we don’t need to pass 2000 region proposals to
the CNN every time, but instead of that, the CNN is done only once per image and a
feature map is generated from it.
From the picture 2.8, we can clearly see that Fast R-CNN is faster than R-CNN in both
training and testing sessions. Added to that, excluding region proposals has a direct
effect on the performance of the algorithm by reducing the time during the testing
session.

22
CHAPTER 2. THEORETICAL STUDY

Figure 2.8: Comparaison between R-CNN and Fast R-CNN [20]

2.3.1.3 Faster R-CNN

We can conclude that selective search is a slow and time-consuming process that
affects the performance of a network. As a result, Shaoqing Ren et al. developed
an object detection algorithm that does away with the selective search algorithm and
allows the network to learn region proposals [4]. Fast R-CNN likewise, we pass the
input image directly to the convolutional neural network but instead of applying on
the convolutional feature map the selective search algorithm to identify the region
proposals, another network is applied for the prediction of the region proposals. Then,
the predicted region proposals are reshaped using a RoI pooling layer which is used for
the image classification within the proposed region and predict the offset values for the
bounding boxes.
The picture 2.9 describes the components of Faster R-CNN.

Figure 2.9: Faster R-CNN architecture [21]

23
CHAPTER 2. THEORETICAL STUDY

It is composed of 3 parts:

• RPN: Region Proposal Network which is a fully convolutional network, trained


end-to-end, that simultaneously predicts object boundaries and object scores at
each detection which means predicting the area where an object can be found.
The first thing to do is generate anchor boxes which is a set of predefined bounding
boxes, using different sizes to catch images with different sizes. The second thing
we calculate is IOU which refers to the intersection over union ≥ 0.5. Finally, the
output is a feature map of those anchor boxes.

• Region of interest: we have different sizes of the feature map, so we apply the
RoI pooling to reduce all the features.

• Classifier and regressor: draw bounding boxes on the object which we classi-
fied.

The picture 2.10 shows the difference in test-time speed between the different models.
It is clearly shown that Faster R-CNN is much faster than its predecessors. Therefore,
it can even be used for real-time object detection.

Figure 2.10: Comparaison in test time speed [22]

2.3.2 One-stage methods


One-Stage object detection models are a type of one-stage object detection model,
which skips the region proposal stage of two-stage models and runs detection directly
over a dense sampling of locations. These types of models usually have faster inference
possibly at the cost of performance. Since there is no intermediate task which is region
proposals, this leads to a simpler and faster model architecture. In this section, SSD
and YOLO are used as examples of one-step object detection models.

2.3.2.1 Single Shot Multi-Box Detector (SSD)

SSD is an abbreviation for Single Shot MultiBox Detector[5]. It’s a single-stage


object detection approach that uses multibox to discretize the output space of bounding

24
CHAPTER 2. THEORETICAL STUDY

boxes into a series of default boxes at varied aspect ratios and scales per feature map
position, requiring only one shot to detect multiple items present in an image.
SSD has a base VGG-16 network followed by multibox conv layers. In fact, VGG-16
base network for SDD is standard CNN architecture for high quality image classification
but without the final classification layers. It is used for feature extraction. Then, we
add additional convolutional layers for detection.
SSD does not split the image into a grid like YOLO, but it predicts the offsets of the
predefined anchor boxes for every location in the feature map. In this case, each box
has a fixed size and position in relation to its corresponding cell. The framework of the
SSD is seen in figure 2.11.

Figure 2.11: Framework of the SSD[23]

The final bounding box prediction is calculated by adding the predicted offset over the
default boxes and the regression loss is calculated to correct the offset on the basis of
the ground-truth bounding box matched. In the situation where no predicted boundary
box can be mapped to any of the ground truth boundary boxes, the regression loss is
fixed to zero.
W. Liu et al.[5] have proposed an SSD detection framework that follows an output
discretization of bounding boxes into a default set of boxes (similar to Faster-RCNN
anchor boxes) for every feature map location. The default boxes have various aspect
ratios and scales to better match any object shape in the image. In addition, SSD
combines feature map predictions at multiple scales to further manage the scales of
objects relative to the image. SSD is a one-stage approach that removes the necessity
of an object proposal step, which makes it simpler and faster than the Faster-RCNN
approach.

2.3.2.2 You Only Look Once(YOLO)

The term ”You Only Look Once” is abbreviated as YOLO[6]. It’s a real-time object
detection algorithm that employs neural networks. This algorithm is well-known for
its trade-off between speed, accuracy, and ability to learn. It has been used to identify
traffic signals, persons, parking meters, and animals in a variety of applications. It

25
CHAPTER 2. THEORETICAL STUDY

employs CNN to detect objects in real-time. A single algorithm run is used to predict
the entire image. At the same time, the CNN is used to predict multiple bounding
boxes and class probabilities.
The three techniques used by the YOLO algorithm are as follows:

• Residual blocks: this means the image is divided into various grids. Each grid
has the dimensions S x S.

• Bounding box regression: an outline that highlights an object in a picture is


known as a bounding box. To estimate the height, width, center, and class of
objects, YOLO uses single bounding box regression.

• Intersection Over Union (IOU): assuming that there are two bounding boxes,
one green and the other blue. The blue box represents the predicted box, and the
green box represents the actual box. YOLO makes sure the two boundary boxes
are the same size and equal.

The final detection will be made up of distinct bounding boxes that exactly suit the
objects. The YOLO algorithm’s several steps are illustrated in the figure below.

Figure 2.12: YOLO algorithm steps [24]

Each of the ss grid cells predict bounding boxes B. The model produces a confidence
score for each bounding box indicating the probability that the cell contains an object
[7]. The score for each bounding boxes does not classify the type of object or know
which object matches the bounding box, it simply gives a confidence value or probability
value of the quality of the bounding boxes surrounding an object in each cell[8]. It is
the intersection of the union (IOU) between the predicted box and the detected box,
defined as:

26
CHAPTER 2. THEORETICAL STUDY

The confidence score = Pr (Object) × IOU

With Pr (Object) is the probability of having an object. If no object exists in this cell,
the confidence scores must be zero. The model also produces 4 numbers to represent the
position and dimensions of the predicted bounding box: the center of the bounding box
in the (x, y) and the width (w) and height (h) of the bounding box. YOLO, a unified
one-step detection model proposed by J. Redmon et al. [9], reassigns object classifiers
to the object detection task. In YOLO, a single detection model is used for predicting
both object class probabilities and bounding box parameters in a single forward pass.
When compared to Faster-RCNN, this allows YOLO to save time. An illustration of
YOLO detection model is shown in figure 2.13

Figure 2.13: Illustration of You Only Look Once (YOLO)

YOLO v1 [9] is the first version of YOLO detectors that uses the Darknet framework
which is trained on the ImageNet dataset. The use of YOLO v1 was relatively limited.
The issue is primarily caused by the inability of this version to detect small objects
in images. The second version of YOLO was released [10] at the end of 2016. The
improvements of this version is mainly about faster execution time and it includes all
well more advanced components. Precisely, YOLO v2 improves the stability of the
neural network, with the increase of the input size of the image which improved the
Mean Average Precision (MAP) up to 4%. Furthermore, YOLO v2 addressed the
problem of missing detection of small objects in the first version by dividing the image
into 13 × 13 grid cells. This enables YOLO v2 to recognize and locate smaller objects
in the image, allowing it to become equally effective with larger objects. YOLOv3 is the
third version of the YOLO family. It performs 3 scales prediction to enable multi-scale
detection as the case for FPN detector. For the following reasons, the third version has
become one of the most popular object detectors: For starters, YOLOv3 is more efficient
in terms of speed, as it runs much faster than previous YOLO versions. In addition,

27
CHAPTER 2. THEORETICAL STUDY

the third version of the YOLO algorithm is much more precise in the detection of small
objects. It also makes the identification of classes more exact.

2.3.2.3 YOLOv3-tiny architecture

The YOLOv3 tiny model is a simplified version of the YOLOv3 model with less
mean average precision and more frames per second as it is shown in the table down
below 2.1.

Method FPS mAP(%)


YOLOv3 49 52.5
YOLOv4 41 64.9
YOLOv3-tiny 277 30.5
YOLOv4-tiny 270 38.1

Table 2.1: Comparison of different methods in FPS and mAP [25]

When speaking of architectures, we have 3 principle blocks : Backbone, Neck and Head.
For YOLOv3 tiny we have :

- Backbone:Darknet-53 network [11]

- Neck: Multi-Scale Feature Pyramid Network [11]

- Head: YOLO layers [11]

YOLOv3 uses darknet53 as backbone which uses many 1x1 and 3x3 convolution kernels
to extract features. YOLOv3 tiny reduces the number of convolutional layers, its basic
structure has 13 convolutional layers and 6 Max pooling layers, and then features are
extracted by using a small number of 1x1 and 3x3 convolutional layers [12]. To achieve
dimensional reduction, YOLOv3-tiny uses the pooling layer instead of YOLOv3’s con-
volutional layer with a step size of 2 [13]. However, its convolutional layer structure
still uses the same structure of Convolution2D + BatchNormalization + LeakyRelu as
YOLOv3. The image is first fed into Darknet53 for feature extraction before being fed
into the feature pyramid network for feature fusion. The results are then generated by
the YOLO layer. In a one-stage detector, the head’s role is to make the final prediction,
which consists of a vector with the bounding box’s width, height, class label, and class
probability.
The network structure of YOLOv3 tiny is shown in Figure 2.14 .

28
CHAPTER 2. THEORETICAL STUDY

Figure 2.14: The network structure of Tiny-YOLO-V3. [1]

Conclusion
This chapter introduced the deep learning and computer vision domains. Also it
provided an overview of existing solutions in the field of object detection that use CNN
architectures. Two types of algorithms of detection were presented: Two-Stage and
One-Stage object detection algorithms with different models.

29
Chapter 3

Preliminary and conceptual Study

Introduction
As we explained in the first chapter, the needs are also the functionalities offered
by our solution. We will describe the functional needs by actor and the non-functional
needs that our project must meet. This phase is essential in the life cycle of any
software and more precisely with the Y methodology, then we present our proposed
solution. Finally, we will detail the different elements of the design of our proposed
solution in the conceptual study section.

3.1 Functional branch


In this section we study the functionalities provided by our solution. We divided this
section into three parts: actors identification, functional needs and use cases diagrams.

3.1.1 Actors identification


An actor is any person or entity likely to interact with a well-defined system. The
system needs analysis showed that we have one and only main actor who is the end
user of the graphical interface (for example security agents). The GUI user is the first
customer enjoying different features. In fact, he would be able to monitor the robot
in real-time without physically being present all the time. Finally, this agent must
visualize the current state of the robot and the area.

3.1.2 Functional needs


We will start with the functionalities that our system will provide. In fact, an in-
telligent video analytics system needs to be able to detect and track objects in a video
feed, identify unusual behavior, and generate alerts accordingly. It should be able to

30
CHAPTER 3. PRELIMINARY AND CONCEPTUAL STUDY

work with live video streams in real-time, and be able to integrate with other security
systems.
The system has the potential to replace human security guards in a number of ways.

-First, video analytic can be used to automatically identify and track people and
objects in a video feed. This means that it can be used to monitor an area for secu-
rity purposes without the need for a human guard to constantly be watching the footage.

-Second, video analytic can be used to raise an alarm in the event of a security
breach. This could be done by detecting unusual patterns of movement or by recogniz-
ing specific objects that are not supposed to be in a particular area.

-Third, video analytic can be used to provide detailed reports of events that have
occurred in an area. This could be used to investigate a security breach or to track the
movements of people and objects over time.
Overall, video analytic has the potential to replace human security guards in a number
of ways. It is more accurate than human guards, it can monitor a larger area, and it
can provide detailed reports of events.
The process would be as follow:

- Monitoring the area by observing the 4 cameras of the PGuard robot.

- If a person is detected an alarm will be triggered.

- When an alarm is triggered a recording will be started on the source camera of the
alarm for 25 seconds with 5 seconds in advance for further observation by a human.

Moreover, building deep learning models that are capable of detection or tracking is not
sufficient, the difficult task is to be able to use it in industry as well. This means being
able to take your models and put them into a production environment so that they can
be used by others. As a result, the main task is to build a pipeline and a workflow for
deploying any kind of models to be ready to use. So, the integration part is a necessity
in any computer vision project and it is challenging.

3.1.3 Use cases diagram


This part represents the functional point of view of the software system to be devel-
oped, i.e. the set of functionalities to be satisfied for the users of the system through
concepts of use cases, actors and their relationships (association, dependence and gen-

31
CHAPTER 3. PRELIMINARY AND CONCEPTUAL STUDY

eralization). A use case diagram typically contains the following:

- Actors: represent the users of the system. An actor can be a person, an organiza-
tion, or a piece of software.
-Use cases: represent the actions that users can perform with the system. A use
case should be a complete and self-contained description of an action that a user can
take.
- Associations: represent the relationships between actors and use cases. An asso-
ciation indicates that an actor can perform a use case.
- Generalizations: represent inheritance relationships between actors or use cases.
A generalization indicates that one actor or use case is a more specific type of another
actor or use case.

• Global use cases diagram A use case highlights a functionality, i.e. an in-
teraction between the actor and the system. The use cases delimit the system,
its functions and relations with its environment. They therefore represent a way
of using the system and make it possible to describe its functional requirements.
The following figure 3.1 shows the general use case diagram.

Figure 3.1: Global use cases diagram

32
CHAPTER 3. PRELIMINARY AND CONCEPTUAL STUDY

3.2 Technical branch


In this section, we will describe the technical needs and constraints. Specifications
which do not express a function of the software but rather constraints to be respected
are added to the functionalities which must ensure a software system to be developed.

3.2.1 Technical needs and criteria


These specifications can be technical, organizational or related to quality criteria.
In addition to the functionalities it provides, our future system must meet the following
criteria:
- The system must be easy to maintain, flexible and scalable in order to respond to
possible extensions required by the field of use.

- For our application, reliability and speed are two proportional constraints. The
satisfaction of one affects the other. Our solution must guarantee a minimum error rate
and try to reduce the response time as much as possible.

- In addition, we must cover all possible cases in order to avoid any exception.
Finally, our solution must imperatively be able to ensure security during the exchange
of information between the VMS and the VPS and to guarantee that neither can access
the viewing of video streams from monitored sites.

3.2.2 Constraints
In this part, we will describe in details the technical constraints that our solution
must follow. First of all, our company uses a VMS called XProtect Milestone so it’s a
necessity to integrate our solution in this video management software. In the section
below, we will present some VMS alternatives.
There are many video management softwares (VMS) available on the market today.
Some of the most popular are Milestone XProtect, Genetec Security Center, and Hikvi-
sion NVR. Each has its own unique set of features and benefits, so it is important to
choose the one that is right for your specific needs.

• Hikvision NVR [26]: is a cost-effective VMS that is perfect for small busi-
nesses. It offers many of the same features as the other two VMSs, but it is more
affordable. Hikvision NVR is also very easy to use, so it is a good choice for

33
CHAPTER 3. PRELIMINARY AND CONCEPTUAL STUDY

businesses that don’t have a lot of IT resources.

Figure 3.2: Hikvision logo

- One of the main strengths of Hikvision’s VMS is its ease of use. The system
is designed to be user-friendly and can be easily navigated by users of all levels
of experience. The VMS also offers a wide range of customization options, which
allows users to tailor the system to their specific needs.

- A weakness of Hikvision’s VMS is its reliance on proprietary hardware. The sys-


tem can only be used with Hikvision cameras and other Hikvision products. This
can be seen as a disadvantage by some users who may prefer to use third-party
products.

Overall, Hikvision’s Video Management System is a powerful and versatile surveil-


lance solution that offers a wide range of features and capabilities. It is easy to
use and can be customized to meet the specific needs of any organization.

• Genetec Security Center [27]: is a comprehensive VMS that is ideal for large
organizations. It offers advanced features such as video analytics, license plate
recognition, and integration with other security systems. Genetec Security Cen-
ter is also very scalable and can be customized to meet the specific needs of your
business.

Figure 3.3: Genetec Security Center logo

34
CHAPTER 3. PRELIMINARY AND CONCEPTUAL STUDY

- GSC has a number of features that make it a powerful VMS, such as the abil-
ity to manage and monitor multiple security devices from a single interface, the
ability to create rules and alerts to notify users of potential security threats, and
the ability to integrate with other Genetec security products.

- However, GSC also has some weaknesses. One weakness is that it is a very com-
plex system, which can make it difficult to use and configure. Another weakness
is that it is not as widely compatible with third-party security devices as some
other VMSs.

• Milestone XProtect [28]: is a powerful and easy-to-use VMS that is perfect


for small to medium-sized businesses. It offers a wide range of features, including
remote viewing, event-based recording, and motion detection. Milestone XPro-
tect is also very scalable, so it can grow with your business.

Figure 3.4: Milestone XProtect logo

- One of the main strengths of XProtect Milestone VMS is its scalability. The
software can be used to manage small surveillance systems with just a few cam-
eras, or large enterprise-level systems with hundreds or even thousands of cameras.
Additionally, the software is easy to install and configure, and users can be up
and running in just a few minutes.

- Another strength of XProtect Milestone VMS is its compatibility with a wide


range of security cameras and other security devices. The software supports IP
cameras, analog cameras, and even drones, and can be used with a variety of other
security devices such as access control systems and alarm systems. This allows
users to create a comprehensive security solution that meets their specific needs.
- The main weakness of XProtect Milestone VMS is its cost. The software is not
cheap, and the cost can quickly add up for users who need to purchase additional
licenses for additional cameras or devices. Additionally, the software requires a

35
CHAPTER 3. PRELIMINARY AND CONCEPTUAL STUDY

fair amount of technical knowledge to install and configure, which can be a barrier
for some users.

In this manner, we choosed to work with Milestone XProtect since it is more scalable and
the best software to work with. It offers a great deal of flexibility and customization.
Basically, it provides an MIP SDK that enables you to develop your own features
without their intervention. It also has a robust feature set that includes things like
video analytics and alarm management. Furthermore, to go back in time, the hosting
company has developed a PGMS which is a ”PGuard Management System” that it has
integrated into XProtect VMS Milestone lately. This management system allows the
management and control of the various functionalities of the PGuard robot on the one
hand and the control of this PGuard through devices connected to the interface.

3.3 Proposed solution


In this section, we will describe our proposed solution. After detailing the problem
and studying the existing solutions, we can now mention the objectives to be achieved
in our project.
The solution consists in developing a workflow for an intelligent video analytic solu-
tion. First, we develop a graphic interface to manipulate the monitoring of the area
by the PGuard robot through the IP cameras connected to the interface. Basically,
this solution is intended for security guards who control the PGuard robot by offering
them a convenient interface to operate it. First, we developed a plugin integrated in
XProtect Milestone, our chosen VMS that the company used to work with. Different
video analytics capabilities can be attached to the VMS. Milestone introduced lately
the VPS toolkit which is a set of tools that allow developers to connect an external
compute platform using the gstreamer open source framework to build custom video
analytic pipelines. VPS is now an integral part or component of the milestone inte-
gration platform software development kit (MIP SDK)[29] as well this means that the
video processing service can be created with the XProtect video management software
or located on a seperate system off site or even deployed in the cloud and thus increas-
ing the flexibility of how such video analytics solutions can be deployed and integrated
with.
The figure 3.5 illustrates the general workflow of the video streams. The input is the
video streams from the robot cameras, forwarded to the video management system that
will transmit them to the intelligent video analytics to be processed and finally return
the video streams with the metadata related to the video streams.

36
CHAPTER 3. PRELIMINARY AND CONCEPTUAL STUDY

Figure 3.5: IVA and video management components

Second, we develop the pipeline of video processing service. VPS [30] currently enables
the analysis of live video streams where live video is parsed into the environment where
VPS is deployed for analysis and which returns metadata and a video as output from
the video analysis application that is back injected into the VMS. Finally, the plugin
integrated with the video management software will read the metadata that is returned
from the analytic application to generate alarms to the user when something occurs in
the monitored area, also a recording video will be available to the end user for review.
The figure 3.6 shows the IVA, our VPS and the flow of video streams.

37
CHAPTER 3. PRELIMINARY AND CONCEPTUAL STUDY

Figure 3.6: Solution architecture

3.4 Conceptual study


After analyzing the needs of the users, therefore via this section, we will detail the
various elements of the design of our proposed solution. This step is very critical in the
development cycle of our application since it makes it possible to satisfy the needs iden-
tified in the specification phase on the one hand, and prepares for the implementation
on the other hand.

3.4.1 Deployment diagram


The deployment diagram models the physical architecture of the hardware, this
modeling must take into account three aspects such as the software aspect, hardware
and the integration of these two.
This diagram describes the physical resources of the solution, i.e. it describes the nodes
on which the system will operate. A node can be a computer, a person, a manual
process, and each node can contain several components. The figure 3.7 illustrates our
deployment diagram.

38
CHAPTER 3. PRELIMINARY AND CONCEPTUAL STUDY

Figure 3.7: Deployment diagram

The solution then consists of three nodes:


- The command pc node: it contains the Milestone plug-in in the Smart Client software
which constitute the graphical interfaces and allow interaction with the user. Thus, it
groups all the commands used in our application.
- The jetson nano: it contains the video processing service which will receive video
frames via HTTP protocol using the VPS Driver within the Recording Server [41]. It
sends back to the VPS driver the metadata to be then stored in the database and then
visualized in the Smart Client and finally analyzed by our plug-in.
- The robot node: it contains the IP cameras which will be accessed by the VMS.

3.4.2 Activity diagrams


Activity diagrams help focus on processing. Therefore, they are particularly suitable
for modeling the flow of control flows and data flows. They therefore make it possible

39
CHAPTER 3. PRELIMINARY AND CONCEPTUAL STUDY

to graphically represent the behavior of a method or the routing of use cases. The
following figure 3.8 represents a complete scenario of using our plugin in Milestone
XProtect and navigating within it.
The agent must launch the Milestone VMS by first checking the servers (Milestone
XProtect Management server and Milestone XProtect Recording server). Once the
server is launched, the agent can access the plug-in.
If the robot is connected as well as the cameras, the agent can view the video feeds
from the cameras, and he can have an idea about the current status of the robot in
real-time. The agent can activate intrusion detection to ensure optimal surveillance of
the area where the robot performs its patrols by means of artificial intelligence. In case
of intrusion detection, an alarm is generated to notify security agents in real-time and
finally could stop this by disabling this functionality.

40
CHAPTER 3. PRELIMINARY AND CONCEPTUAL STUDY

Figure 3.8: Activity diagram

41
CHAPTER 3. PRELIMINARY AND CONCEPTUAL STUDY

3.4.3 Sequence diagrams


In what follows we present the sequence diagrams describing the different use cases
explained in the previous section. To facilitate the reading of the following sequence
diagrams, we must explain some notions that are present in them.

Management server: This is a server included in the Milestone system (Mile-


stone XProtect Management server) it must always be active to be able to access the
Milestone.

3.4.3.1 Sequence diagram of ”Authenticate”

This is the first step that takes place when an application is launched. According to
the following figure 3.9, the user must launch Milestone to enter his connection parame-
ters, then the system will verify the entry data with Milestone’s Xprotect Management
server. The latter displays the ”home interface” space, in case of error, an error message
will be displayed.

Figure 3.9: Sequence diagram of ”Authenticate”

3.4.3.2 Sequence diagram of ”Start auto monitoring”

The sequence diagram presented by the following figure 3.10 is relative to the step
of starting the automatic surveillance. After the authentication of the user, and the

42
CHAPTER 3. PRELIMINARY AND CONCEPTUAL STUDY

connection verification of the robot and the cameras, the user can open the Smart
Client to access the plug-in, so that the system activates the ”Start” button. Once
this button is clicked, the system will send the video streams from the IP cameras in
the robot to a VPService and then read the metadata in real-time so it could be ana-
lyzed whether there is a person detected at that moment or not. If a person is detected
an alarm will triggered to the user interface with a recording videos at that specific time.

Figure 3.10: Sequence diagram of ”Start auto monitoring”

Conclusion
In this chapter we have detailed the main functionalities of our application which
we have illustrated by use case diagrams in order to ensure a better understanding and
a good mastery of our work as well as the conceptual study of our solution which will
be enriched among other things through deployment diagram, sequence diagrams and
an activity diagram, reflecting the static and dynamic aspects. In the next chapter we
will present the realization part and implementation of our project.

43
Chapter 4

Realization

Introduction
This last part of our project is very critical since it puts into reality all the the-
ory studied in the previous chapters. In this last chapter we will firstly present the
hardware and software environment in which our project was elaborated, indicating the
technologies used. Then, we will present the data used for building our model. Finally,
we conclude this chapter with the interfaces of our solution.

4.1 People detection model


In this section, we will present our approach for people detection based on a custom
model of YOLOv3 tiny. We give the implementation details about the steps followed
as well as the datasets that we have tried all along with the training process. Also, we
describe the experiments conducted to validate the proposed approach. The obtained
results are analyzed as well.

4.1.1 Development environment


Building and training our model was made using Google Colab.

• Google Colab: It is a web IDE for python, to enable Machine Learning with
storage on the cloud. It is a Google product.

44
CHAPTER 4. REALIZATION

Figure 4.1: Google Colaboratory LOGO

4.1.2 Data source


Before deciding to train our custom model using COCO dataset[31], we choosed
CrowdHuman dataset[43]. Following the same steps and configurations, we have con-
cluded that the dataset is not the suitable choice for people detection since it focuses
on crowd and density pedestrian even after cleaning the data and labelling it image per
image. The results of the train using CrowdHuman dataset was as following: mAP =
29.5% / Avg Loss = 4.
The major difficulty for training a YOLOv3 tiny custom model of person detection is
the choice of the database. The feature extractor block of YOLOv3 tiny uses the images
of the dataset as input and learns patterns and features that ultimately lead to a global
overview of the classes in the dataset. This model is to be trained to detect persons,
so it is essential to find a dataset that contains this class. To keep the learning process
accurate as possible, the images in the dataset must have the following characteristics:

• All the images in the dataset should contain at least one person.

• The persons in the images should be of different sizes, scales, shapes and heights.
For example, some of the images may contain very close persons that can be
appreciated perfectly in the image, but other persons may be very far away. Also,
images should contain persons from different camera angles.

To sum up, the dataset should cover persons in different situations, in which they can
be found in the real life (weather condition, camera status,etc.), so our model which
is intended for real-world applications can detect them under all circumstances. The
dataset for person detections, assessed in our work are discussed in the next paragraphs.

4.1.2.1 COCO dataset

The dataset that we used for training our custom model is MS COCO dataset [31]. In
fact, MS COCO (Microsoft Common Things in Context) is a large-scale image dataset
of 328,000 samples of commonplace objects and humans with 80 object categories.
Annotations from the dataset can be used to train machine learning models to recognize,
classify, and describe things.

45
CHAPTER 4. REALIZATION

Applying a pretrained model that is initially trained on a general object dataset as


the case of COCO may be not the most appropriate solution to our problem of person
detection. For this reason, we propose to substitute the pretrained model with a custom
model trained on the train set of our dataset to improve the accuracy of the detection
results.
Since we are working with only one class, we need to only keep the samples with that
category. Therefore, we used a tool called FiftyOne [44] in order to select images of a
specific COCO category ”person”.

4.1.2.2 Data preparation

Data preparation is the process of cleaning and organizing data so that it can be
used for analysis. This may involve removing invalid data, filling in missing values, and
reformatting data so that it is consistent. Data preparation is an important step before
creating deep learning models because it can improve the accuracy and quality of the
results.
Therefore, we used techniques such labeling with the help of the Roboflow platform
tool [45]. We started off with 5019 images then after the data cleaning procedure we
ended up with 4670 images split as following :

- Train set: 74% , 3.5K images

-Validation set: 15%, 700 images

-Test set: 11%, 502 images

Also, to make sure that all images are in the same size, we resized them to 416x416.
In order for data to be useful, it must be properly labeled. This process is known as
data labeling, and it is a critical part of any data-driven project. With the help of
Roboflow, we assured the annotation of all our images.
By the end of this process, we obtain a .txt file that has the same name of the corre-
sponding image and it contains the class and the bounding box annotations in a YOLO
format as shown in the figure 4.2 below.

46
CHAPTER 4. REALIZATION

Figure 4.2: annotations file

Box coordinates (x center, y center, width, height) must be normalized (from 0–1),
which means they must be divided by the image width and height.

4.1.3 Detection phase


In the detection phase, an object detection algorithm is applied to an image to
detect potential objects. Since we are dealing with real-time constraints due to the
robot and alarm management system, as a result, the speed of detection matters and
is a crucial factor in our system.

4.1.3.1 Implementation and configuration details

We have choosed to apply the yolov3-tiny model. The YOLOv3-tiny model is a


downsized version of the original YOLOv3 model[14]. It is designed for fast running
object detection. YOLOv3-tiny network can easily meet real-time requirements on a
typical computer with a modest Graphics Processing Unit (GPU). The model is based
on the darknet framework and can be used with a variety of different applications.
When working with a custom YOLOv3-tiny model, there are a couple of changes that
should be done in order for the model to be trained correctly. The following instructions
were executed throughout the training process :

• Download the pretrained weights: yolov3-tiny.conv.11 .

• Clone and build the darknet repository.

• Create and edit the configuration file yolov3-tiny-custom.cfg as shown in the table
4.1, filters = ((nb classes + 5)*3) Setting the batch size 64 means we will be using
64 images for every training iteration.

47
CHAPTER 4. REALIZATION

Batch 64
Subdivisions 16
width 416
height 416
max batches 12000
filters 18
classes 1
learning rate 0.001

Table 4.1: The configurations in cfg file

• Create obj.data and obj.names as a pre-training step: obj.names will contain the
class names, in our case the class name is ”person”. The figure 4.3 represents the
obj.data file which contains the number of classes and the emplacement of the
train and valid samples.

Figure 4.3: obj.data file

4.1.3.2 Training and testing

Now that our dataset is ready we can launch the training command to start the
training.

Figure 4.4: training command

The results of the train were as following: mAP = 40% / Avg Loss = 2.
After finishing the training process, the weights are saved in the backup folder. This file
will be used in the testing phase. As for the test command, we assigned the obj.data
file, the custom config file and the best weights obtained from the training as it is shown
in the figure 4.5.

48
CHAPTER 4. REALIZATION

Figure 4.5: testing command

4.2 Deployment phase


At this point, our model is trained and tested with our custom data and the testing
results are fairly accurate. In other words, our model is ready to be deployed on the
Nvidia Jetson Nano card using DeepStream SDK. The inference process is based on
two elements : Hardware and Software.

4.2.1 Hardware tools


The NVIDIA Jetson Nano [33] is a powerful device for embedded computing. It is
designed to be used in a wide range of applications, including robotics, drones, and edge
computing devices. It is a powerful edge AI device that gives you the ability to run
multiple neural networks in parallel. It’s small form factor and low power consumption
makes it ideal for use in embedded devices and drones. The Jetson Nano also comes
with a full-sized HDMI port, four USB 3.0 ports, and an M.2 slot for NVMe storage. We
mention the characteristics of our computer on which we developed our pipeline because
it can give an idea of the working conditions. NVIDIA Jetson Nano characteristics are
the following:

• CPU: Quad-core ARM Cortex-A57 MPCore processor

• GPU: NVIDIA Maxwell architecture with 128 NVIDIA CUDA® cores

• Memory: 4 GB 64-bit LPDDR4, 1600MHz 25.6 GB/s

• Storage: 16 GB eMMC 5.1

49
CHAPTER 4. REALIZATION

Figure 4.6: NVIDIA Jetson Nano [33]

4.2.2 Software tools


In order to deploy our model, we used DeepStream SDK that we will explain in the
next part.

4.2.2.1 JetPack SDK

The NVIDIA JetPack SDK[34] is the most complete platform for creating AI appli-
cations. It contains the most recent Jetson OS images, as well as libraries and APIs,
samples, developer tools, and documentation. DeepStream for streaming video analyt-
ics, Isaac for robotics, and Riva for conversational AI are all supported by JetPack SDK.
It is a software development kit for the NVIDIA Jetson embedded supercomputers. The
SDK provides a complete Linux environment for developing CUDA applications as well
as support for installing the latest version of the CUDA toolkit. We choosed to install
JetPack version 4.5 since it supports DeepStream 5.1 that we will be working with.

Figure 4.7: NVIDIA JetPack

50
CHAPTER 4. REALIZATION

4.2.2.2 DeepStream SDK

Figure 4.8: NVIDIA DeepStream

DeepStream[35] is an open source streaming analytics toolkit that enables developers


to build real-time data pipelines for modern applications. DeepStream is an SDK for
developing high-performance video analytics applications on NVIDIA GPUs. It en-
ables developers to build and deploy sophisticated video analytics pipelines on NVIDIA
GPUs. It also includes a set of pre-built, optimized plugins for popular open-source
computer vision and deep learning frameworks. It is scalable, fault-tolerant, and easy
to use, with a rich set of features that support for Apache Kafka, Apache Hadoop,
Apache Cassandra, and other big data technologies.

Figure 4.9: DeepStream SDK [35]

DeepStream SDK is based on the GStreamer framework[36]. GStreamer is a powerful


multimedia framework that enables the creation of a wide variety of media-rich appli-
cations. The framework is based on a plug-in architecture that allows developers to

51
CHAPTER 4. REALIZATION

create custom plug-ins to extend the functionality of the framework. The GStreamer
framework is an open source project.
In fact, the GStreamer-based DeepStream reference application as shown in the figure
4.10 consists of a collection of GStreamer plugins that encapsulate low-level APIs to
create an entire graph. The reference application can accept input from a wide range
of sources, including cameras, RTSP input, encoded file input, and it also supports
multiple streams and sources. NVIDIA has implemented and made available a number
of GStreamer plugins as part of the DeepStream SDK, including:

- Gst-nvstreammux: the Stream Muxer plugin to build a batch of buffers from


several input sources.
- Gst-nvinfer: the NVIDIA®TensorRT™ based plugin does inference using Ten-
sorRT on the input data.
- Gst-nvtracker: the OpenCV-based tracker plugin for object tracking with a unique
ID.
- Gst-nvmultistreamtiler: the Multi Stream Tiler plugin for compositing 2D array
of frames.
- The Onscreen Display (OSD) plugin (Gst-nvdsosd) to draw shaded boxes, rectan-
gles and text on the composited frame using the generated metadata.
- The Message Converter (Gst-nvmsgconv) and Message Broker (Gst-nvmsgbroker)
plugins in combination to send analytics data to a server in the Cloud.

Figure 4.10: Architecture of the NVIDIA DeepStream reference application [42]

4.2.2.3 TensorRT

TensorRT[37] is a high performance inference engine for Nvidia GPUs. It enables de-
velopers to optimize models for deployment on embedded, mobile, and server platforms.
TensorRT can also be used to improve the performance of deep learning applications
such as image classification, object detection, and machine translation.

52
CHAPTER 4. REALIZATION

Figure 4.11: NVIDIA TensorRT

4.2.3 VPS Toolkit


The VPS Toolkit[30] is an MIP SDK toolkit for forwarding video from an XProtect
camera device to a GStreamer pipeline.
A VPS Driver-based device can be used to bring video and/or metadata output from
the GStreamer pipeline back into the XProtect VMS. However, this is not required; it
is entirely up to you how to consume the GStreamer pipeline’s output.
An overview diagram of an XProtect VMS configuration with the VPS Toolkit process-
ing video and delivering it back into the XProtect VMS is shown below.

Figure 4.12: VPS overview [30]

The orange arrows show how video and metadata from a real camera flow through the
system. A Camera Driver captures video and metadata, which is then transmitted to

53
CHAPTER 4. REALIZATION

be optionally recorded on disk and provided as a live stream to the Smart Client. In
an XProtect VMS, this is how the typical flow of video and metadata works.
The VPS Driver sends video to a VP Service an ASP .NET Core 3.1 web application,
which processes it using a GStreamer pipeline. The video and/or metadata output from
the GStreamer pipeline can optionally be transmitted back into the XProtect VMS via
the VPS Driver. This is represented in the diagram by the blue arrows. The VPS
Driver receives the feed from the VP Service and exposes it via a new camera device
and a new metadata device. The video and/or metadata feeds can now be used exactly
as if they had originated from a physical camera device. You can, for example, record
the feed and view both the current stream and the recorded data in the Smart Client.

4.2.4 Deployment
The first step is to install the JetPack version 4.5 on Jetson Nano and install Deep-
Stream SDK 5.1. Second, we create two txt files named respectively
’deepstream app config yoloV3 tiny.txt’ and
’config infer primary yoloV3 tiny.txt’ that will contain parameters about our custom
model and paths to our custom yolov3-tiny configuration file, labels file and the best
weights. We should change the batch size and subdivisions to 1 for the test.
The nvdsinfer custom impl yolo folder, contains the YOLO custom implementation
files, should be re-make using the command in the figure 4.13.

Figure 4.13: make command

Also, in order to detect bounding boxes correctly, we have created a new parse-bbox-
function in this folder that the deepstream app config.txt file will call. This function
depends on the anchors and masks belonging to our custom model’s config file. We
should also speicify the number of classes which is in our case one class.
After we finish preparing our files, we run the command in the figure 4.14 that will create
a TensorRt engine file and run the application on a specific video that we mention its
location in the deepstream app config file.

Figure 4.14: deepstream command

54
CHAPTER 4. REALIZATION

The last step is to run the VPService on Linux using the command make run. As
shown in the figure 4.15 we have specified the port in which our service will listen
on by editing the ’appsettings.Production.json’ file. Also, we have modified first the
GStreamer plugins, that are written in C++ language, ”vpsdeepstream” in which we
specified the deepstream application path we created and the plugin ”vpsnvdstoonvif”
which is at the end of this pipeline, and it converts the bounding boxes created by
the DeepStream elements into Onvif format and passes them along as a vpsonvifmeta.
This allows XProtect to understand it and display the bounding boxes in the XProtect
SmartClient.

Figure 4.15: running VPS

Further more, DeepStream SDK provides numerous ready-to-use deepstream applica-


tions from different models such as Primary Nano we used to test our VPService.

4.3 Integration into Milestone XProtect


A plugin, also called add-on or extension, is a piece of software that adds function-
alities to an existing software application without altering the host program itself.
In the context of Milestone XProtect, plugins are small programs that add features to
your Video Management software. These features can range from a simple displaying
streams in your software, to complex video analytics solution.

4.3.1 General presentation of MIP SDK


Milestone XProtect system is known as the best video management system. It is
with this in mind that we decided to integrate our plugin into Milestone VMS. In
order to achieve this integration, Milestone VMS provides an MIP SDK “Milestone
Integration Platform Software Development Kit”. The MIP plug-in allows us to create
an operational interface directly integrated into Xprotect Smart Client or Management
Client. The figure below represents the overall architecture of MIP SDK.

55
CHAPTER 4. REALIZATION

Figure 4.16: MIP SDK architecture [29]

To provide maximum flexibility and enable optimal integration of different types of


systems and applications, the integration platform offers three main integration options
which are: Plug-in integration / Component integration / Protocol integration.

• Protocol integration:
A basic integration method that is especially well suited for the integration of
non-Windows applications.

• Component integration:
Allows you to implement MIP components into your application, which is useful
when using Milestone libraries in a Windows-based application.

• Plug-in integration:
The most refined method of integration. This allows us to integrate our plugin
into the Milestone XProtect Management environment and to run our plugin as
an integral part of XProtect software and its client applications.

4.3.2 Development environment


The development environment consists of two parts: hardware and software envi-
ronments.

4.3.2.1 Hardware environment

We mention the characteristics of our computers on which we developed our models


because they can give an idea of the working conditions. Our plugin was built on a

56
CHAPTER 4. REALIZATION

ConceptD 500 Acer computer shown in the figure 4.17 whose characteristics are the
following:

• CPU: Intel® Core™ i9 (i9 - 9900K, 3.60 GHz, 16 MB)

• GPU: NVIDIA Quadro RTX 4000

• RAM: 16GB of 2666 MHz DDR4

• Storage: 2 TB

Figure 4.17: ConceptD 500

4.3.2.2 Software environment and technologies

• Milestone VMS: is known as the best system used in world, it provides a MIP
SDK that will allow us to develop our plugin in the Milestone system.

Figure 4.18: Milestone XProtect VMS [32]

• Visual Studio: is a comprehensive set of development tools for creating ASP.NET


web applications, XML web services, desktop applications, and mobile applica-
tions.

57
CHAPTER 4. REALIZATION

Figure 4.19: Visual Studio logo

• StarUML: is open source software. It is a modeling platform with the UML


language. This tool offers all the diagrams necessary for good modelling.

Figure 4.20: StarUML logo

• .Net framework is a framework that can be used by Microsoft Windows op-


erating system and Microsoft Windows Mobile, and aims to simplify the task of
developers by providing a unified Windows or Web application design method.

Figure 4.21: .Net framework

• Windows Presentation Foundation (WPF): is a user interface framework


for building desktop client applications. It uses XAML (Extensible Application
Markup Language) to provide a declarative model for application programming.

Figure 4.22: WPF

58
CHAPTER 4. REALIZATION

4.3.3 VMS Plugin development


Here comes our analytic plugin that will read the metadata received in the VPS
driver, from the VPService to analyze it. First of all, to assist with configuration,
the VPS Toolkit provides a Management Client plugin called VPS Configuration. The
plugin provides a simple interface for choosing camera sources and specifying the VP
Service and GStreamer plugins that will be used to process the feeds.
The configuration steps in the Management Client are as follow:

• Adding the IP cameras to the recording server[38] manually by entering the IP


address and port.

• Setting the VPS configuration in the Management Client plugin by adding a VPS
Service url to a source camera or a group, in our case we will add it to a group
since the robot has 4 cameras. As a result, 4 VPS hardware will be created. Each
VPS hardware have 2 devices: camera device, and metadata device that would
be set as related metadata to the source camera.

• Defining the Analytic event that will be sent on 9090 port to the Event server.
You could specify the network addresses that could send the analytic event for
better security.

• Configuring a rule[39]. A rule determines highly important settings, such as when


cameras should record. In our case, we will set a rule to start recording on the
devices from metadata for 25 seconds.

• Defining the alarm in the Management Client. The alarm sources are the 4
cameras, and the triggering event is the one we have defined. We set the alarm
priority as high. Basically, the alarm is generated by the Event server and appears
in the Smart Client.

The plugin is responsible of reading in real-time the metadata of each VPS metadata
device which is in our case four, analyze it and check if there is a person detected, if so
then an analytic event in an XML format will be sent to the Event server over TCP/IP
connection. The source of the analytic event will be the camera in which the person is
detected. As a result, a recording is started on that source camera.

4.4 Implementation
This section is dedicated to the presentation of the main interfaces of our solution
mainly the plugin.

59
CHAPTER 4. REALIZATION

4.4.1 Authentication interface


The following figure 4.23 illustrates the essential connection phase: customers can
connect to Milestone by entering their names and passwords.

Figure 4.23: Login interface

4.4.2 Milestone XProtect Smart Client home interface


The figure 4.24 represents the Smart Client home interface which is the ”Live” tab.
The robot has 4 cameras: 3 optical and one thermal camera. The picture below shows
the home page when there’s no VPS activated which means no bounding boxes will be
displayed around objects.

60
CHAPTER 4. REALIZATION

Figure 4.24: Home interface

4.4.3 Plugin interface


The figure 4.25 shows the menu of the Smart Client where we can see the tab
”ClientPlugins” that represents our plugin that we have integrated in XProtect VMS
Milestone.

Figure 4.25: Menu in Smart Client

The figure 4.26 represents the plugin that we developed which is integrated into the
XProtect Milestone VMS. As shown in the figure, We have two buttons in our plugin,
the button ”Start” will start the functionalities of monitoring the area and triggering
alarms whenever a suspected The ”Stop” button will stop the process of monitoring.

• Start button: This will launch a parallel thread to read the metadata of all
streams received from the VPS and then if there is a bounding boxes of a human,
an Analytic event[40] will be sent to the Event server[38] that is responsible for
a variety of functions relating to events, alerts, and maps, as well as perhaps
third-party integrations via the MIP SDK, in order to trigger an alarm that will
activate the recording on the related device for a definite period.

• Stop button: This will stop the monitoring of the robot’s cameras which means
the surveillance and the alarms are disabled.

61
CHAPTER 4. REALIZATION

Figure 4.26: Plugin interface

4.4.4 Alarm manager interface


The alarm manager interface shows the alarms triggered with its camera source.
You can see in the figure 4.27 that there are five alarms triggered. Each one will display
the recorded stream during the time of the alarm. Our program runs each 25 seconds
for every single camera to prevent occluded videos in real-time. You can export the
recorded streams that you will find in the ’Playback’ tab of the menu.

Figure 4.27: Alarm manager interface

When testing our model on the robot’s cameras, we have observed that the model
can detect on the thermal camera however it is only trained on visible images which
highlights the effectiveness of our model. The picture 4.28 shows how our model detect
on thermal camera.

62
CHAPTER 4. REALIZATION

Figure 4.28: Detection on thermal camera

Conclusion
This last chapter is devoted to represent the steps followed to realize our solution and
to the presentation of the results of our project. For this we have exposed the different
environments and techniques adopted, followed by a presentation of the interfaces of
our platform through some screen prints.

63
General conclusion

Our end-of-studies internship was carried out within the startup ”Enova Robotics”.
It allowed us to set up a solution for the integration of AI-powered video analytics
into Milestone XProtect to help the security guard performing his task correctly and
comfortably through a video management system.

To achieve our goal, we started with studying the existing solutions. Then we ex-
plored the methodology necessary to carry out our project.

We also extracted the functional needs required by the specifications to develop the
preliminary design to finish with the detailed design. Finally, we presented the work
carried out as well as the different environments and techniques adopted.

We obviously faced some technical problems when integrating the environments.


Which leads us to train ourselves to solve these problems and better handle the tool.
During the realization of this project, we have mastered the Dot Net Framework. The
latter is a real challenge and requires a lot of effort to go further in the development
and integration of other features.Also, the Python programming language during the
model development and C++ in the deployment phase.

In addition to the advantages on the technical side, this project was an opportunity
to improve and develop our communication skills and allowed us to better integrate
into the professional world.

In perspective, we admit that this work is only a glimpse of professional life. We


therefore aim to consider several improvements such as the possibility of integrating
multiple deep learning models in the same plugin, so instead of starting the monitoring
and detecting only people, we could have multiple choices between objetcs to detect.

64
Bibliography

[1] Wei Fang, Lin Wang, and Peiming Ren. Tinier-yolo: A real-time object detection
method for constrained environments. IEEE Access, PP:1–1, 12 2019.

[2] Sebastian Ruder. An overview of gradient descent optimization algorithms. CoRR,


abs/1609.04747, 2016.

[3] Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich fea-
ture hierarchies for accurate object detection and semantic segmentation. CoRR,
abs/1311.2524, 2013.

[4] Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. Faster R-
CNN: towards real-time object detection with region proposal networks. CoRR,
abs/1506.01497, 2015.

[5] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed,
Cheng-Yang Fu, and Alexander C. Berg. SSD: single shot multibox detector.
CoRR, abs/1512.02325, 2015.

[6] Juan Du. Understanding of object detection based on CNN family and YOLO.
Journal of Physics: Conference Series, 1004:012029, apr 2018.

[7] J. Lin and M. Sun. A yolo-based traffic counting system. In 2018 Conference on
Technologies and Applications of Artificial Intelligence (TAAI), pages 82–85, Los
Alamitos, CA, USA, dec 2018. IEEE Computer Society.

[8] Aleksa Ćorović, Velibor Ilić, Siniša urić, Mališa Marijan, and Bogdan Pavković.
The real-time detection of traffic participants using yolo algorithm. In 2018 26th
Telecommunications Forum (TELFOR), pages 1–4, 2018.

[9] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look
once: Unified, real-time object detection. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), June 2016.

[10] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. CoRR,
abs/1804.02767, 2018.

65
Bibliography

[11] Upesh Nepal and Hossein Eslamiat. Comparing yolov3, yolov4 and yolov5 for
autonomous landing spot detection in faulty uavs. Sensors, 22(2):464, 2022.

[12] Tao Li, Yitao Ma, and Tetsuo Endoh. A systematic study of tiny yolo3 inference:
Toward compact brainware processor with less memory and logic gate. IEEE
Access, PP:1–1, 08 2020.

[13] Dong Xiao, Feng Shan, Ze Li, Ba Tuan Le, Xiwen Liu, and Xuerao Li. A target
detection model based on improved tiny-yolov3 under the environment of mining
truck. IEEE Access, 7:123757–123764, 2019.

[14] Wen-Hui Chen, Han-Yang Kuo, Yu-Chen Lin, and Cheng-Han Tsai. A lightweight
pedestrian detection model for edge computing systems. In International Sympo-
sium on Distributed Computing and Artificial Intelligence, pages 102–112. Springer,
2020.

66
Webliography

[1] https://www.enovarobotics.eu/

[2] https://enovarobotics.eu/pguard/

[3] https://enovarobotics.eu/minilab/

[4] https://www.enovarobotics.eu/other-products/

[5] https://www.enovarobotics.eu/agv/

[6] https://www.veertec.com/

[7] https://www.milestonesys.com/marketplace/veertec-ltd/zen-analytics-
platform/

[8] https://www.openpath.com/vms

[9] https://www.briefcam.com/resources/videos/

[10] https://en.wikipedia.org/wiki/Unified_Process

[11] https://fr.wikipedia.org/wiki/Two_Tracks_Unified_Process

[12] https://www.ibm.com/cloud/blog/ai-vs-machine-learning-vs-deep-
learning-vs-neural-networks

[13] https://towardsdatascience.com/everything-you-need-to-know-about-
neural-networks-and-backpropagation-machine-learning-made-easy-
e5285bc2be3a

[14] https://www.researchgate.net/figure/Activation-function-a-Sigmoid-
b-tanh-c-ReLU_fig6_342831065

[15] https://towardsdatascience.com/softmax-activation-function-
explained-a7e1bc3ad60

[16] https://sebastianraschka.com/faq/docs/gradient-optimization.html

67
Webliography

[17] https://towardsdatascience.com/adam-latest-trends-in-deep-learning-
optimization-6be9a291375c

[18] https://medium.com/@RaghavPrabhu/understanding-of-convolutional-
neural-network-cnn-deep-learning-99760835f148

[19] https://wansook0316.github.io/ds/dl/2020/09/02/computer-vision-02-
RCNN.html

[20] https://towardsdatascience.com/r-cnn-fast-r-cnn-faster-r-cnn-yolo-
object-detection-algorithms-36d53571365e

[21] https://www.researchgate.net/figure/Faster-R-CNN-Architecture-
9_fig1_324549019

[22] https://www.researchgate.net/figure/Comparison-of-test-time-speed-
of-different-R-CNN_fig7_340712186

[23] https://m.researching.cn/articles/OJ60cb5c99374ba95f/figureandtable

[24] https://pyimagesearch.com/2018/11/12/yolo-object-detection-with-
opencv/

[25] https://www.researchgate.net/figure/Comparison-of-different-
methods-in-FPS-and-mAP_tbl1_345653552

[26] https://us.hikvision.com/en/partners/technology-partners/vms

[27] https://www.genetec.com/products/unified-security/security-center

[28] https://www.milestonesys.com/solutions/platform/video-management-
software/

[29] https://doc.developer.milestonesys.com/html/index.html

[30] https://doc.developer.milestonesys.com/html/gettingstarted/intro_
vps_toolkit.html

[31] https://cocodataset.org/#home

[32] https://content.milestonesys.com/search/media/?field=metaproperty_
Assettype&value=CompanyLogo&field=metaproperty_Assetcategory&
value=Logo&multiple=true&filterType=add&hideFilter=false&filterkey=
savedFilters

[33] https://developer.nvidia.com/embedded/jetson-nano-developer-kit

68
Webliography

[34] https://developer.nvidia.com/embedded/jetpack

[35] https://developer.nvidia.com/deepstream-sdk

[36] https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_
plugin_Intro.html

[37] https://developer.nvidia.com/tensorrt

[38] https://doc.milestonesys.com/latest/en-US/system/sad/sad_
servercomponents.htm

[39] https://doc.milestonesys.com/latest/en-US/standard_features/
sf_mc/sf_mcnodes/sf_5rulesandevents/mc_rulesandeventsexplained_
rulesandevents.htm

[40] https://doc.milestonesys.com/latest/en-US/standard_features/sf_mc/
sf_mcnodes/sf_5rulesandevents/mc_analyticsevents_rulesandevents.htm

[41] https://doc.milestonesys.com/latest/en-US/system/security/
hardeningguide/hg_recordingserver.htm

[42] https://docs.nvidia.com/metropolis/deepstream/4.0.2/dev-guide/
index.html#page/DeepStream%20Development%20Guide/deepstream_app_
architecture.html

[43] https://www.crowdhuman.org/

[44] https://voxel51.com/docs/fiftyone/integrations/coco.html

[45] https://app.roboflow.com/

[46] https://www.youtube.com/watch?v=o9K6GDBnByk

69

You might also like