0% found this document useful (0 votes)
174 views

Intelligent Security Monitoring System With Video Based Face Recognition-Documentation

This document is a project report submitted by Logesh VS, Nagalakshmi R, and Shanthini G in partial fulfillment of the requirements for a Bachelor of Technology degree in Information Technology. The report describes the development of an Intelligent Security Monitoring System using video-based face recognition. The system applies machine learning techniques like the Viola-Jones algorithm for face detection and a convolutional neural network for face verification to assist security personnel.

Uploaded by

shivani sai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
174 views

Intelligent Security Monitoring System With Video Based Face Recognition-Documentation

This document is a project report submitted by Logesh VS, Nagalakshmi R, and Shanthini G in partial fulfillment of the requirements for a Bachelor of Technology degree in Information Technology. The report describes the development of an Intelligent Security Monitoring System using video-based face recognition. The system applies machine learning techniques like the Viola-Jones algorithm for face detection and a convolutional neural network for face verification to assist security personnel.

Uploaded by

shivani sai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 52

INTELLIGENT SECURITY MONITORING SYSTEM

WITH VIDEO BASED FACE RECOGNITION


A PROJECT REPORT

Submitted by

LOGESH V S 1807029
NAGALAKSHMI R 1807033
SHANTHINI G 1807046

In partial fulfillment for the award of the degree


of

BACHELOR OF TECHNOLOGY
in
INFORMATION TECHNOLOGY

COIMBATORE INSTITUTE OF TECHNOLOGY, COIMBATORE-641014


(Government Aided Autonomous Institution Affiliated to Anna University)

ANNA UNIVERSITY, CHENNAI 600025

MARCH 2021

i
COIMBATORE INSTITUTE OF TECHNOLOGY
(A Govt. Aided Autonomous Institution Affiliated to Anna University)
COIMBATORE – 641014

BONAFIDE CERTIFICATE

Certified that this project report titled “INTELLIGENT SECURITY


MONITORING SYSTEM WITH VIDEO BASED FACE RECOGNITION ”
is the bonafide work of LOGESH VS (1807028), NAGALAKSHMI R (1807033)
and SHANTHINI G (1807046) in partial fulfillment for the award of the Degree of
Bachelor of Technology in Information Technology of Anna University, Chennai
during the academic year 2020-2021 under my supervision.

Prof.N.K.KARTHIKEYAN, Mr.N.SELVAMUTHUKUMARAN,
HEAD OF THE DEPARTMENT, SUPERVISOR,
Department of Information Technology, Department of Information Technology,
Coimbatore Institute of Technology, Coimbatore Institute of Technology,
Coimbatore - 641014. Coimbatore - 641014.

Certified that the candidates were examined by us in the project work viva-vice
examination held on …………………

Internal Examiner External Examiner


Place:
Date:

ii
TABLE OF CONTENTS

CHAPTER TITLE PAGE NO.


NO.

ACKNOWLEDGEMENT V

ABSTRACT VI

LIST OF ABBREVATIONS VII

1 INTRODUCTION 1

1.1 NEED FOR SECURITY 1

1.2 MACHINE LEARNING 1


1.3 FACE RECOGNITION 2
1.3 VIOLA-JONES ALGORITHM 3

1.4 CONVOLUTIONAL NEURAL NETWORK 4

2 LITERATURE SURVEY 6

3 SYSTEM ARCHITECTURE 14

4 SYSTEM SPECIFICATION 17

4.1 HARDWARE SPECIFICATION 17

4.2 SOFTWARE SPECIFICATION 17

5 DESIGN&IMPLEMENTATION 18

5.1 VIDEO FRAGMENTATION 18

5.2 FACE DETECTION 18

5.2.1 HARR-LIKE FEATURE GENERATION 18


5.2.2 INTEGRAL IMAGE 19
5.2.3 ADABOOST TRAINING 19

iii
5.2.4 CASCADE CLASSIFIER 20
5.3 PREPROCESSING OF FRAMES 21

5.4 TRAINING A CNN MODEL 21

5.4.1 CONFUSION MATRIX GENERATION 21

5.5 EVALUATION OF MODEL 22

6 IMPLEMENTATION 23

6.1 FUNCTION 23
6.1.1 ALGORITHM 23
7 CONCLUSION 31

8 APPENDIX 32
APPENDIX – I

7.1 SNAP SHOTS FOR OUTPUT 32

APPENDIX – II

7.2 SOURCE CODE 34

9 REFERENCES 47

ACKNOWLEDGEMENT

iv
Our project “Intelligent Security Monitoring System with video based face
recognition” has been the result of motivation and encouragement from many, whom
we would like to thank.

We express our sincere thanks to our Secretary Dr.R.Prabhakar and our


Principal Dr.V.Selladurai for providing us a greater opportunity to carry out our
work. The following words are rather very me agree to express our gratitude to them.
This work is the outcome of their inspiration and product of plethora of their
knowledge and rich experience.

We record the deep sense of gratefulness to Dr.N.K.Karthikeyan, Head of the


Department of Information Technology, for his encouragement and support during
this tenure

We equally tender my sincere gratitude to our project guide


Mr.N.Selvamuthukumaran, Department of Information Technology, for his
valuable suggestions and guidance during this course.

During the entire period of study, the entire staff members of the Department of
Computer Science and Engineering & Information Technology have offered
ungrudging help. It is also a great pleasure to acknowledge the unfailing help we have
received from our friends.

It is a matter of great pleasure to thank our parents and family members for
their constant support and cooperation in the pursuit of this Endeavour.

ABSTRACT

v
The areas with large flow of peoples like airports, border control areas, etc., would
have frequent emergency situations. So, there requires higher degree of security to
prevent the unwanted coercive change. The security endowed in those conditions is of
traditionally and also the monitoring of crime is difficult with limited man power to
provide with the complete security. Another major issue is that higher volume of
video data brings the complexity in video analysis by a human. The Intelligent video
retrieval technology has become a crucial part of video monitoring and face
recognition has been proven very effective in security critical environment. Hence this
system has been developed to recognize the faces of suspect with Viola-Jones
Algorithm for face detection and it is capable of identifying a person from a video
frame to bring in better accuracy and to establish stability in security. This system also
applies convolution neural network to process the image information from the video to
verify the person. The faces in the surveillance video in real time has been extracted,
recorded and with the use of deep learning model which was built on the basis of
convolution neural network, the Single face and multi face images has been detected
and recognized to effectively assist the security personnel in dealing with the crisis.
The system not only has a high academic value, but also will bring great contribution
to national security, social stability and so on.

vi
LIST OF ABBREVIATIONS

ABBREVIATION EXPANSION

CNN Convolutional Neural Network

AI Artificial Intelligence

ML Machine Learning

vii
CHAPTER-1
INTRODUCTION

1.1 Need for Security

Today's world is plagued by major security problems, necessitating the deployment


of many specially trained surveillance personnel to achieve the required level of security.
The aim of security is to protect and extend people's fundamental freedoms. It
necessitates both the protection of people from serious and widespread threats as well as
the empowerment of people to take control of their own lives. Hence there comes the
primary responsibility of ensuring stability. As human beings, these personnel can make
mistakes that compromise the level of protection. With the exponential growth of video
surveillance and analysis, the large amount of information present in the monitoring
image has outpaced the effective processing range of human resources. However, since
there is huge volume of video data and also the security personnel refer to the video
recordings later, after the time of a crisis, the process of enquiring is very complicated .

1.2 Machine Learning

Machine learning is a form of data analysis that automates the development of


analytical models. It's a branch of artificial intelligence focused on the premise that
computers can learn from data, recognize patterns, and make decisions with little to no
human input. It is a branch of computer science that enables computers to learn without
being specifically programmed. One of the most exciting developments that one has ever
encountered is machine learning. As the name suggests, it gives the machine the
opportunity to learn, which makes it more human-like. Machine learning is currently in
use, perhaps in far more places than one would think. It deals with the concept where in
DATA(Input) + Output feed in, run it on machine during training and the machine creates
its own program(logic), which can be evaluated while testing.

Machine learning is one of the most fascinating technology to have ever been
introduced to. As the name implies, it provides the machine with the opportunity to learn,

1
which makes it more human-like. Machine learning is currently in use, perhaps in far
more

2
places than one would think. We possibly employ a learning algorithm on a regular
basis without even realizing it. It has a variety of applications, including: Web Search
Engine, Photo tagging Applications, Spam Detector, etc.,

Machine Learning is now being used by businesses to improve business


decisions, increase efficiency, identify disease, predict weather, and many other tasks.
We need better tools to understand the data we have now, but we also need to plan for the
data we will have in the future, thanks to the exponential growth of technology. To
accomplish this, we must create intelligent machines. To do simple tasks, we can write a
program. However, hardwiring intelligence into it is always challenging. The best way to
do it is to devise a method for machines to self-learn. A learning mechanism – if a
computer can learn from feedback, it can do the heavy lifting for us.

1.3 Face recognition

In today's automation era, machines with artificial intelligence process all


data, which is then used in a variety of sophisticated applications. Despite the fact that
many agencies have built cutting-edge security systems, recent terrorist attacks have
uncovered significant flaws in even the most advanced security systems. As a result,
various agencies are taking security data based on physical or behavioural activities
more seriously and are inspired to develop it. Face recognition, on the other hand, can be
passively achieved without the user being involved, because faces are captured by the
camera from a distance , thus making it more suitable for protection and surveillance
purposes. Facial recognition technology raises the bar on monitoring by allowing for
automatic and indiscriminate live surveillance as well as an intelligent video retrieval
system that incorporates video processing and artificial intelligence to significantly
enhance security monitoring performance and accuracy. Image analysis and processing,
pattern recognition, signal processing, embedded computing, and communication are all
areas of research in video surveillance system. Face recognition systems that use video
typically have two modules: one that locates the face and another that recognizes it.

3
Face recognition is a technique for recognizing or confirming an individual's
identity by looking at their face. Face recognition system can recognize individuals in
photographs, videos, or in real time. Face recognition system employs computer
algorithms to identify unique, distinguishing features on a person's face. These details,
such as eye distance or chin shape, are then transformed into a mathematical
representation and compared to data from other faces in a face recognition database. A
face prototype is data about a specific face that differs from an image in that it is
structured to only contain those information that can be used to differentiate one face
from another.

1.4 Viola-Jones Algorithm

Face detection is a crucial step in the process because it identifies and locates
human faces in photographs and videos. The Viola-Jones detection framework
combines the concepts of Haar-like Features, Integral Images, the AdaBoost
Algorithm, and the Cascade Classifier to construct a fast and accurate face detection
system. A Haar-like characteristic is made up of darker and lighter areas. The sum of
light-regions intensities is subtracted from the sum of dark-regions intensities,
yielding a single value Haar-like features enable us to extract useful image
information for processing such as edges, diagonal and straight lines that are suitable
for sensing a face. Since calculating the total of dark/light rectangular regions is
necessary to extract Haar-like features, representing the image as an integral image (as
given in figure 8) cuts down on the time it takes to complete the task and allows for
quick feature evaluation.

4
Fig 1.1 Viola Jones Framework

The AdaBoost (Adaptive Boosting) Algorithm is a machine learning


algorithm that selects the best subset of features from a large number of options. The
algorithm produces a classifier named strong classifier as an outcome of its work. A
Cascade Classifier is a multi-stage classifier capable of fast and accurate detection. The
AdaBoost Algorithm generates a strong classifier for each point. A sequential (stage-by-
stage) evaluation is performed on an input. This multi-stage approach enables the
development of simpler classifiers, which can then be used to rapidly reject the majority
of negative (non-face) input while devoting more time to positive (face) input.

1.5 Convolutional neural network

The main job of the system is to verify whether or not the input image
is that of the claimed individual , with an input image and a name or ID of a person is
given. This process is carried out by convolution neural networks. Convolution
neural networks are a form of neural network that works particularly well for images
because it provides building blocks with less parameters and perceives different items
at each layer of the network.

A Convolutional Neural Network is a Deep Learning algorithm which can


take in an input image, assign importance (learnable weights and biases) to various
aspects/objects in the image and be able to differentiate one from the other. The pre-
processing required in a ConvNet is much lower as compared to other classification
algorithms. While in primitive methods filters are hand-engineered, with enough

5
training, ConvNets have the ability to learn these filters/characteristics. The
architecture of a ConvNet is analogous to that of the connectivity pattern of Neurons
in the Human Brain and was inspired by the organization of the Visual Cortex.
Individual neurons respond to stimuli only in a restricted region of the visual field
known as the Receptive Field. A collection of such fields overlap to cover the entire
visual area.

Fig 1 .1 Convolutional neural network

A ConvNet is able to successfully capture the Spatial and Temporal


dependencies in an image through the application of relevant filters. The architecture
performs a better fitting to the image dataset due to the reduction in the number of
parameters involved and reusability of weights. In other words, the network can be
trained to understand the sophistication of the image better.

The pre-warning analysis of the captured video images is the key priority of
the video surveillance system in the field of public security, but the post video
analysis wastes a lot of manpower and resources. As a result, facial recognition
technology is being used in the area of public security video surveillance, which will
reduce the risk of criminal activity and preserve social stability. In short, video
surveillance, which is needed for protection, began to shift from ‘‘visible" to
‘‘comprehensible". The versatile nature of face recognition technology, made us

6
aware of the importance in broadening and deepening the face recognition application
level.

CHAPTER -2

LITERATURE SURVEY

[1]Similar Face Recognition Using the IE-CNN Model(2020) proposed by An-ping


Song , Qian Hu. In this paper,the following two factors are needed to make progress in
this field: (i) the availability of large scale similar face training datasets, and (ii) a
fine-grained face recognition method. With the above factors fulfilled, we make two
contributions. First, we show how a large scale similar face dataset (SFD) can be
assembled by a combination of automation and human in the loop, and divide the
dataset into five grades according to different degrees of similarity. Second, a new
fine-grained face feature extraction method is proposed to solve this problem using the
attention mechanism which combines the Internal Features and External Features. The
Labeled Faces in the Wild (LFW) database, CASIA-WebFace and similar face dataset
(SFD) were selected for experiments. It turns out that the true positive rate is
improved by 1.94 - 5.66% and the recognition accuracy rate improved by 2.08 – 5.8%
for the LFW and CASIA-WebFacedatabase, respectively. Meanwhile for SFD, the
recognition accuracy rate improved by18.80 – 35.84%.The system has faces difficulty
in model training and also has less efficiency.

[2]Face recognition using Viola-Jones depending on Python(2020)proposed by


Khansaa, DheyaaIsmael,Stanciu Irina. In this paper, the proposed software system
based on face recognition.The proposed system can be implemented in the smart
building or any VIP building need security interring in general. The human face will
be recognized from a stream of pictures or video feed, this technology recognizes the

7
person by using Viola–Jones object detection framework. The task of the proposed
facial recognition system consists of two steps, the first one was detected the human
face from live video using the web camera in the computer,and the second step
recognizes if this face allowed to enter the building or not by comparing it with the
existing database.Finally, this proposed software system can be used to control access
in smart buildings as a rule and the advancement of techniques connected around
there, Providing a security system is one of the most important features must be
achieved in the smart buildings.

[3] Face Recognition using Content Based Image Retrieval for Intelligent
Security(2019) proposed by Sri karnila and Rio Kurniawan. This paper try to
construct an intelligent security system based on face recognition. The data used in
this research are frontal face images and without obstacles, and facial images with
obstacles. In this research, they used Content Based Image Retrieval or CBIR
method. Approximately 10,000 images used in this work which is collected from
internet, police department office, and shooting directly as primary data. Facial image
data are stored in the database object-based files through process of identification and
facial recognition. Consequently, facial images are retrieved using facial similarity
techniques. In this stage of identification , an application can specify shape of the front
face, performs feature extraction, and running intelligent Similarity (matching face
data) which open the door automatically . This system can be used to minimize the
occurrence of criminality occurs nowadays. This system can be used such as for house
door security, office doors, and airport gates. The experiments result show that our
algorithm quite good in face detection and recognition to open the door.

[4]An Improved Two-Step Face Recognition Algorithm Based on Sparse


Representation(2019) proposed by Yongjun Zhang , Qian Wang .This paper uses
weighted score fusion , but the weights need to be set manually. The results generally
vary greatly when the weights are different, so it is difficult to find the optimal
weights. This is why it is necessary to constantly set different weights for
experimental comparisons to find the optimal weights. In this paper, an improved
fusion method is proposed for above shortcoming, that is, multiplication fusion

8
applicable to sparse representation. The fusion scheme not only is easy to use but also
does not need to be artificially set weights. Moreover, it is consistent with the
correlation between the classification error and the score obtained by the experimental
analysis. In the field of face recognition, it has been shown that the two-step face
recognition (TSFR) based on representation using the original training samples and
the generated "symmetric face" training samples can achieve excellent face
recognition performance. Face recognition based on multiplication fusion and TSFR
proposed in this paper can further improve the recognition accuracy.

[5]Occlusion Aware Facial Expression Recognition Using CNN With Attention


Mechanism(2019) proposed by Yong Li, JiabeiZeng,Shiguang Shan, Xilin Chen. In
this paper,they proposed a convolution neutral network (CNN) with attention
mechanism (ACNN) that can perceive the occlusion regions of the face and focus on
the most discriminative un-occluded regions.ACNN is an end-to-end learning
framework. It combines the multiple representations from facial regions of interest
(ROIs).Each representation is weighed via a proposed gate unit that computes an
adaptive weight from the region itself according to the unobstructedness and
importance. Considering different RoIs, they introduce two versions of ACNN: patch-
based ACNN and global–local-based ACNN .pACNN only pays attention to local
facial patches. gACNN integrates local representations at patch-level with global
representation at image-level. The proposed ACNNs are evaluated on both real and
synthetic occlusions, including a self-collected facial expression dataset with real-
world occlusions, the two largest in-the-wild facial expression datasets (RAF-DB and
AffectNet) and their modifications with synthesized facial occlusions. Experimental
results show that ACNNs improve the recognition accuracy on both the non-occluded
faces and occluded faces. ACNNs are capable of shifting the attention from the
occluded patches to other related but unobstructed ones.

[6]Deep Unified Model For Face Recognition Based on Convolution Neural


Network and Edge Computing (2019) proposed by Muhammad
ZeeshanKhan,SaadHarous, SaleetUl Hassan. This paper proposes an algorithm for
face detection and recognition based on convolution neural networks (CNN), which

9
outperform the traditional techniques. In order to validate the efficiency of the
proposed algorithm, a smart classroom for the student's attendance using face
recognition has been proposed. The face recognition system is trained on publically
available labeled faces in the wild (LFW) dataset. The system can detect
approximately 35 faces and recognizes 30 out of them from the single image of 40
students. The proposed system achieved 97.9% accuracy on the testing data.
Moreover, generated data by smart classrooms is computed and transmitted through
an IoT-based architecture using edge computing. A comparative performance study
shows that our architecture outperforms in terms of data latency and real-time
response.

[7]Face-Specific Data Augmentation for Unconstrained Face


Recognition(2019)proposed by Iacopo Masi,Anh Tu á n Tran,Tal Hassner,Gozde
Sahin,Gerard Medioni.In this paper , they identify two issues as key to developing
effective face recognition systems: maximizing the appearance variations of training
images and minimizing appearance variations in test images. The former is required to
train the system for whatever appearance variations it will ultimately encounter and is
often addressed by collecting massive training sets with millions of face images. The
latter involves various forms of appearance normalization for removing distracting
nuisance factors at test time and making test faces easier to compare. They describe
novel, efficient face-specific data augmentation techniques and show them to be
ideally suited for both purposes. Together, with additional technical novelties,they
describe a highly effective face recognition pipeline which obtains state-of-the-art.

[8]A Fast and Accurate System for face detection, identification, and
verification(2018) proposed by Rajeev Ranjan, Ankan Bansal.In this paper, we
describe a deep learning pipeline for unconstrained face identification and verification
which achieves state-of-the-art performance on several benchmark datasets. We
provide the design details of the various modules involved in automatic face
recognition: face detection, landmark localization and alignment, and face
identification/verification. We propose a novel face detector, Deep Pyramid Single
Shot Face Detector (DPSSD), which is fast and detects faces with large scale

10
variations (especially tiny faces). Additionally, we propose a new loss function, called
the Crystal Loss, for the tasks of face verification and identification. Crystal Loss
restricts the feature descriptors to lie on a hypersphere of a fixed radius, thus
minimizing the angular distance between positive subject pairs and maximizing the
angular distance between negative subject pairs. We provide evaluation results of the
proposed face detector on challenging unconstrained face detection datasets. Then, we
present experimental results for end-to-end face verification and identification on
IARPA Janus Benchmarks A, B and C (IJB-A, IJB-B, IJB-C), and the Janus
Challenge Set 5 (CS5).

[9] Gentle Adaboost algorithm based on multifeature fusion for face


detection(2018) proposed by Zhengqun Wang1 , Chunlin Xu2 .In this paper there are
few types of Haar-like rectangle features, which leads to the problem that the classifier
training time is too long due to the large number of feature quantities required in the
description of the face. Local binary patterns (LBP) are used to describe the local
texture features of the face image. Considering the inadequacy of basic LBP features
in face detection, unified MB-LBP features and unified rotation-invariant LBP
features are used to describe local texture features of faces. Considering the
shortcomings of MB-LBP feature and rotation-invariant LBP feature on face edge
information, the edge azimuth field feature based on Canny operator is combined with
the above two features to describe face information. Finally, the Gentle Adaboost
classifier was designed to classify all the extracted features. The experimental results
show that the unified MB-LBP feature and the unified rotation invariant LBP feature
and the edge azimuth field feature based on Canny operator can not only describe the
face information from the local but also the whole, which greatly improves the
detection rate and detection speed of the face with multiple poses and different
rotation modes.

[10] Learning Discriminative Aggregation Network for Video-Based Face


Recognition and Person Re-identification[2018] proposed by Yongming
Rao1,Jiwen Lu1. In this paper, we propose a discriminative aggregation network
method for video-based face recognition and person re-identification, which aims to

11
integrate information from video frames for feature representation effectively and
efficiently. Unlike existing video aggregation methods, our method aggregates raw
video frames directly instead of the features obtained by complex processing. By
combining the idea of metric learning and adversarial learning, we learn an
aggregation network to generate more discriminative images compared to the raw
input frames. Our framework reduces the number of image frames per video to be
processed and significantly speeds up the recognition procedure. Furthermore, low-
quality frames containing misleading information can be well filtered and denoised
during the aggregation procedure, which makes our method more robust and
discriminative. Experimental results on several widely used datasets show that our
method can generate discriminative images from video clips and improve the overall
recognition performance in both the speed and the accuracy for video-based face
recognition and person re-identification.

[11] F-DR Net: face Detection And Recognition In One Net [2018] proposed by
Lei Pang, Yue Ming .It proposed the face multi-task analysis is high-profile in recent
years. Face detection and recognition are more challenging in one net. We present a
new parallel network architecture for two face tasks in one net, achieving end-to-end
face detection and recognition. Firstly, we train a better face detection network. Then,
the selection of the shared layers has a signification impact on the result in speed and
accuracy for recognition, so we determine the optimal shared layers by experiments.
Finally, shared layers contains discriminative information for face recognition, and we
put the recognition network under the shared layer of the detection network. We
achieve parallel end-to-end face detection and recognition in one net,
comprehensively evaluated this method on several face detection and recognition
benchmark datasets, including the Labeled Faces in the Wild (LFW) and Face
Detection Datasets and Benchmark (FDDB). We get better detection and recognition
accuracy on LFW and FDDB, and achieve faster speed compared to other methods.
Our results demonstrate the effectiveness of the proposed approach.

[12] Face Recognition Using Composite Features Based on Discriminant


Analysis[2018] Proposed by Sung-sin Lee ,Sang Tae Choi Extracting holistic features

12
from the whole face and extracting the local features from the sub-image have pros
and cons depending on the conditions. In order to effectively utilize the strengths of
various types of holistic features and local features while also complementing each
weakness, we propose a method to construct a composite feature vector for face
recognition based on discriminant analysis. We first extract the holistic features and
local features from the whole face image and various types of local images using the
discriminant feature extraction method. Then, we measure the amount of
discriminative information in the individual holistic features and local features, and
construct composite features with only discriminative features for face recognition.
The composite features from the proposed method were compared with the holistic
features, local features, and others prepared by hybrid methods through face
recognition experiments for various types of face image databases. The proposed
composite feature vector displayed better performance than the other methods

[13] Joint Head Pose Estimation and Face Alignment Framework Using
Global,local CNN Features[2017] Proposed by Xiang Xu and Ioannis A. Kakadiaris
. In this paper, we explore global and local features obtained from Convolutional
Neural Networks (CNN) for learning to estimate head pose and localize landmarks
jointly. Because there is a high correlation between head pose and landmark locations,
the head pose distributions from a reference database and learned local deep patch
features are used to reduce the error in the head pose estimation and face alignment
tasks. First, we train GNet on the detected face region to obtain a rough estimate of the
pose and to localize the seven primary landmarks. The most similar shape is selected
for initialization from a reference shape pool constructed from the training samples
according to the estimated head pose. Starting from the initial pose and shape, LNet is
used to learn local CNN features and predict the shape and pose residuals. We
demonstrate that our algorithm, named JFA, improves both the head pose estimation
and face alignment. To the best of our knowledge, this is the first system that explores
the use of the global and local CNN features to solve head pose estimation and
landmark detection tasks jointly.

13
[14] Face Retrieval in Video Sequences Using a Single Face Sample[2017]
proposed by Bin Liang, d Lihong ,Zheng, Jiwan Han .In this paper Automatic face
retrieval or verification is a matter to identify whether the target person is the same
person, which has been received considerable attention by researchers in computer
vision. This paper proposes a method to localize a face from video sequences by
considering only one shot. First, Cascade AdaBoost is applied to identify region of a
face from the video sequence. The image enhancement step is followed in order to
reduce the illumination variation. Image enhancement considers facial region's local
mean and standard deviation. Later Singular Value Decomposition (SVD) is used to
generate any number of imitated face images for each face by perturbing its n-ordered
images. Consequently, the problem of face retrieval with one sample image becomes a
common face retrieval problem. In addition, we use the extended t-SNE (t-Distributed
Stochastic Neighbour Embedding) to extract condensed facial features.
Correspondingly the computation cost is reduced largely. On the basis of the original
one facial sample and extended training samples, our proposed method shows a better
performance with comparison to other methods. Therefore, the proposed method is
effective and competitive. The proposed method can be easily generalized to other
face related tasks, such as attribute recognition, general object detection, and face
validation.

[15] Face Recognition and Detection using Neural Networks[2017] proposed by


Vinita Bhandiwad, Bhanu Tekwani .Face recognition is one of the latest technology
being studied area in biometrie as it has wide area of applications. But Face detection
is one of the challenging problems in Image processing. The basic aim of face
detection is determine if there is any face in an image & then locate position of a face
in an image. Evidently face detection is the first step towards creating an automated
system which may involve other face processing. The neural network is created &
trained with training set of faces & non-faces. All results are implemented in
MATLAB 2013 environment.

14
CHAPTER 3

SYSTEM ARCHITECTURE

This chapter explains about the System Architecture of our project.

3.1 PROPOSED SYSTEM

15
3.1.1. Loading datasets:

The CBCL Face Database includes 2900 facial images and 28,000 non-facial
images, which was used for our face detection model. In this dataset, there were almost
200 people's faces. The dataset used for the face recognition model is CASIA Webface
face database, which includes 10575 facial images of various individuals.

3.1.2. Methodology:
Our proposed system is comprised of two main modules: Face detection (to sense
faces) and Face recognition (to verify faces). When given a video with faces, the
system will attempt to locate the subject's face, identify the subject's information, and
finally output the image whose identity information has been processed. Figure 3.1
depicts the system flow for detection and recognition.

Fig 3.1 – System Architecture

16
3.1.3. Video Fragmentation:

Video retrieval is concerned with how to return similar video clips (or scenes,


shots, and frames) to a user given a video query. It involves following steps as video
segmentation- the step by step process in which video is converted into scenes then into
shots then into image, frames and key frame selection which is of face.

3.1.4. Face detection:


Face region is detected from video Frames using Viola-Jones algorithm. To
detect facial region, the model has been established where features has been
extracted. Face region would be detected by referring to features being extracted in
the model. The model is built using viola-jones algorithm which has four phase as
Haar like features, Integral Image, Adaboost Training, Cascade Classifier in order to
locate the region of face.

3.1.5. Preprocessing:
Preprocessing refers to all the transformations on the raw data before it is fed to
the machine learning or deep learning algorithm. For instance, training a convolutional
neural network on raw images will probably lead to bad classification performances.
Preprocessing of key frames includes the following steps as frames are converted into
grayscale and the facial region has been cropped from the grayscale frames and the
frames are normalized in order to get similar data distribution.

3.1.6. Training:

The CNN model is trained with victims faces . The preprocessed images are fed
into CNN model . In the images, the facial features such as eyes, nose, mouth is detected
and extracted using convolution neural network which are the keys to distinguish each
face . A unique feature vector for each face is developed which is in the numeric form.
These numeric codes are also called Face print. Each code uniquely identifies the person
among all the others in the training dataset.

17
3.1.7. Face Recognition:

The features would be extracted from the preprocessed frames and then given for
matching. The code (face print) is compared against a database of other face prints. The
database has images need to be compared. This , then identifies a match for the exact
features in the provided database. It returns the image which is being matched with the
label of the image.

18
CHAPTER 4

SYSTEM SPECIFICATION

This chapter includes the System Specification of our project.

The hardware and software for the system is selected by considering the factors such as
CPU processing speed, peripheral channel speed, printer speed, seek time, relational
delay of harddisk and communication speed etc. The hardware and software
specifications are as follows.

4.1 Hardware Requirements:

Processor : 7th Gen Intel(R) Core(TM) i3-7020U


Ram : 4 GB
Hard disk : 1 TB
Mouse : Wired mouse
Monitor : 15 FT

Table 4.1- Hardware requirements

4.2 Software Requirements:

Operating System : Windows 10


IDE : Anaconda 2019.07(Jupyter Notebook)
Language : Python

Table 4.2- Software requirements

CHAPTER 5

DESIGN & IMPLEMENTATION

24
This chapter explains the design and implementation of Rainfall prediction.

The system is composed of the following modules

1. Video fragmentation
2. Building Viola-Jones model for face detection
3. Preprocessing of detected frames.
4. Training CNN model for face recognition
5. Evaluating the model

5.1 VIDEO FRAGMENTATION

The main purpose of video fragmentation is to capture a frame from the input video
and provide the frames to a face detection framework, since it is impossible to identify
faces directly in video. As a result, when a video is given to the system, it gets portioned
(as shown in figure 4) into visually and temporally coherent bits, and a significant key
frame is extracted and given to the framework, which is the frames with faces for each
specified fragment.

5.2 FACE DETECTION

Human faces are spotted ( as shown in figure 5) within a frame of input video in
this segment, and high-precision face bounding boxes are returned. The system also
has the capability of storing metadata for each detected face for later use. As a
consequence, the faces in the frames will be returned in this module.

5.2.1 HAAR-LIKE FEATURES GENERATION

Eyes, nose, mouth, forehead, and other characteristics have been used to
determine whether a picture includes a human face.As a result,the Haar-like features
have been created for those features.Since the face includes both dark and light
regions, summarizing and comparing the pixel values of those regions is a convenient
way to determine which region is lighter or darker for evaluation. The sum of darker-
area pixel values would be less than the sum of lighter-area pixel values . Haar-like
feature can be used to achieve this .A Haar-like feature is created by dividing a

25
rectangular part of an image into multiple sections. They are represented as adjacent
black and white rectangles (shown in figure 6).

5.2.2 INTEGRAL IMAGE

A fast calculation of summations over image subregions is aided by an integral image.


In order to calculate Haar wavelets, these summations are useful. The following
example illustrates the estimation. Assume an image has a width of w pixels and a
height of h pixels.The integral would then be w+1 pixels wide and h+1 pixels tall.The
integral image's first row and column are all zeros .All other pixels have a value equal
to the sum of all pixels before it. We can take the corresponding boxes in the integral
to find out the summation of the intensities in the black box. The following formula
can be used to measure an integral image:

Pixel values in Bottom right + Pixel values in top left - Pixel values in top right -
Pixel values in bottom left.

5.2.3 ADABOOST TRAINING

AdaBoost is used in the training phase to pick a subset of features and create the
classifier.A large collection of images is generated, the size of which corresponds to
the detection window's size.This collection must provide both positive(faces) and
negative (nonfaces)examples for the desired filter.Each image has an index of l, where
l = 1...L.A corresponding value yl is assigned to each image.Faces will have yl=1 and
nonfaces have yl=0.

Initialize the weights as follows :

Wi,l=1/2p- ,1/2p+ for yl=0,1 respectively

where P– and P+ are the number of nonfaces and faces in the image set.

The algorithm is executed for an arbitrary number of rounds,l.

1.Normalize the weights as follows, resulting in a probability distribution wi,l.

Wi,l / ∑nj=1Wi,j Wi,j

26
2.Train a classifier hj that can only use a single feature for each feature j.The error rate
of the classifier is calculated in terms of wi,l.

n=l-1

Ԑ j =∑ n L=0 Wi,l |hj(xl)-y1|

3.Choose the classifier hj,with lowest error εi.

n=1-Ԑ i

Wi+1,j=Wi,l βni

βi= Ԑ i/1- Ԑ i

The final classifier is

h(x) = 1, ∑ni=0 αihi (xi) ≥ ½ ∑ni=0 ai

0, Otherwise

Where n=l-1

5.2.4 CASCADE CLASSIFIER

Cascading classifiers are trained using hundreds of positive sample views of


a single face and random negative images of the same scale. After the classifier has
been educated, it can be used to detect the facial region in an image. The cascade
classifier is made up of several levels, each of which contains weak learners. The
classifier advances the region to next level if the mark is positive. When the final
stage classifies the region as positive, the detector reports an object(face) located at
the current window spot. All of these weak classifiers will be combined to form a
more powerful one.

5.3. PREPROCESSING OF DETECTED FRAMES

27
The main objective of preprocessing the frames is to enhance the facial image data by
removing unnecessary distortions and strengthening some essential image features for
subsequent processing. It is a critical mechanism because it has a direct effect on the
project's success rate.Since data in frames is unclean, this decreases the difficulty of the
data under study. The detected frames has been cropped , converted into gray scale and
finally normalized.

5.4 TRAINING CNN MODEL

Face recognition is fulfilled by using a convolutional neural network .It


is generally divided into two stages: training and testing. Before training the CNN
model, the dataset images has been classified in such a way that 80% for training data
and 20% for validation data ( shown in figure 15).

Convolutional Neural Networks (CNNs) are a form of Neural Network that has shown
to be extremely successful in image recognition and classification. In the CNN, there
are four major operations: Non-linearity, Convolution (Relu), Subsampling or pooling
and Classification (Fully Connected Layer). Each and every neural network is built on
the foundation of these operations. An image is a pixel value matrix.So it can be
interpreted as a matrix of pixel values in general. In the case of CNN, the primary
objective of convolution is to extract features from the input image. By learning image
features with small squares of input data, convolution maintains the spatial
relationship between pixels. The facial region will be recognized from the
preprocessed video frames after knowing all these features.

5.4.1 CONFUSION MATRIX GENERATION

The confusion matrix was generated ( as shown in figure 18) to visualize the
important predictive analytics such as recall, specificity, accuracy, and precision, as
well as to provide the comparisons of values such as True Positives, False Positives,
True Negatives, and False Negatives.It is a N x N matrix for evaluating a
classification model's results, where N is the number of target groups.The matrix
compares the real goal values to the machine learning model's predictions.This

28
provides us with a detailed picture of how well our classification model is doing and
the types of errors it makes.

5.5 EVALUATION OF MODEL

A loss graph and accuracy graph for training and validation is plotted (as shown
in figure 20). Finally, the model's description is recovered, and the model is saved to
disc. With the deployment of the Viola-Jones model, the prediction is made and the
output (detected face region) is obtained on the video frame after saving the model
(shown in figure 17). The final output (recognized face image) is achieved by
efficiently processing the above-mentioned CNN model and achieved the maximum
accuracy. The overall precision, recall and accuracy values for training the model has
been summarized and evaluated (as shown in figure 21).

29
CHAPTER 6

IMPLEMENTATION

This chapter explains the implementation of our project.

6.1 FUNCTION:

The functioning of all the modules can be well understood by portraying their pseudo
code as follows.

6.1.1 ALGORITHM:

Input : Video Containing the facial image.

Output : The recognized facial image with the label.

Begin

Step 1: Collect the video from the user.

Step 2: Extract the frames from the video.

Step 3: Detect the facial region from the frames.

Step 4: Preprocess the detected frames.

Step 5: Building CNN model for face recognition.

Step 6: Compare the features of preprocessed frames with that of images in database.

Step 7: Returns the image which is being matched with the label of the image.

End

30
CHAPTER 7

CONCLUSION AND FUTURE WORK

This chapter explains the conclusion of the Rainfall prediction.

With the rapid development of video monitoring which plays a crucial part in society
for detecting crime in public areas, identity authentication has also become an
indispensable part of people’s life, so people put forward higher requirements on
safety, reliability of identification, detection and accuracy. In this paper, the viola-
jones algorithm has been implemented successfully with the accuracy of 93% to
identify the faces in a video. The data preprocessing steps, such as grayscale
conversion and normalization, are outperformed by the proposed technique. Despite
the fact that the method has a high level of face detection accuracy. The system has
been tested on frontal facial images and could be expanded to recognize rotating faces
in real-time videos .The model is trained on the CASIA Webface face image dataset
to provide additional data that can be used to enhance accuracy.

The confusion matrix was created to provide a visual representation of the CNN
model that had been trained to recognize faces and to compare the real target values
with that of model's predictions.

This system has been developed to be extremely useful and to provide protection at a
greater level in emergency situations. As a result, our system would decreases the
number of crimes happened in high-traffic areas. The proposed system's later on work
would provide a larger number of rotating images in various scales for identification
to make the system work well in different conditions. Furthermore, a new collection
of features may be applied to the features used in this paper in the future to improve
the system's performance. The experiments results show that the proposed algorithm
has higher detection rate and better performance.

31
CHAPTER 8

This chapter explains the Appendix of our project.

APPENDIX - I

8.1 SNAP SHOTS OUTPUT

VIDEO FRAGMENTATION

VIOLA-JONES TRAINING

FACE DETECTION

32
PREPROCESSING

Cropping Gray scale Conversion Normalization of Pixel

BUILDING CNN MODEL

ACCURACY - LOSS COMPARISON GRAPH AND CONFUSION MATRIX

FACE RECOGNITION:

APPENDIX – II

33
8.2 SOURCE CODE

VIOLA-JONES ALGORITHM IMPLEMENTATION

import numpy as np
import math
import pickle
from sklearn.feature_selection import SelectPercentile, f_classif
class ViolaJones:
def __init__(self, T = 10):
self.T = T
self.alphas = []
self.clfs = []

def train(self, training, pos_num, neg_num):


weights = np.zeros(len(training))
training_data = []
print("Computing integral images")
for x in range(len(training)):
training_data.append((integral_image(training[x][0]), training[x][1]))
if training[x][1] == 1:
weights[x] = 1.0 / (2 * pos_num)
else:
weights[x] = 1.0 / (2 * neg_num)
print("Building features")
features = self.build_features(training_data[0][0].shape)
print("Applying features to training examples")
X, y = self.apply_features(features, training_data)
print("Selecting best features")
indices = SelectPercentile(f_classif, percentile=10).fit(X.T,
y).get_support(indices=True)
X = X[indices]

34
features = features[indices]
print("Selected %d potential features" % len(X))
for t in range(self.T):
weights = weights / np.linalg.norm(weights)
weak_classifiers = self.train_weak(X, y, features, weights)
clf, error, accuracy = self.select_best(weak_classifiers, weights, training_data)
beta = error / (1.0 - error)
for i in range(len(accuracy)):
weights[i] = weights[i] * (beta ** (1 - accuracy[i]))
alpha = math.log(1.0/beta)
self.alphas.append(alpha)
self.clfs.append(clf)
print("Chose classifier: %s with accuracy: %f and alpha: %f" % (str(clf),
len(accuracy) - sum(accuracy), alpha))

def train_weak(self, X, y, features, weights):


total_pos, total_neg = 0, 0
for w, label in zip(weights, y):
if label == 1:
total_pos += w
else:
total_neg += w
classifiers = []
total_features = X.shape[0]
for index, feature in enumerate(X):
if len(classifiers) % 1000 == 0 and len(classifiers) != 0:
print("Trained %d classifiers out of %d" % (len(classifiers), total_features))
applied_feature = sorted(zip(weights, feature, y), key=lambda x: x[1])
pos_seen, neg_seen = 0, 0
pos_weights, neg_weights = 0, 0

35
min_error, best_feature, best_threshold, best_polarity = float('inf'), None, None,
None
for w, f, label in applied_feature:
error = min(neg_weights + total_pos - pos_weights, pos_weights + total_neg -
neg_weights)
if error < min_error:
min_error = error
best_feature = features[index]
best_threshold = f
best_polarity = 1 if pos_seen > neg_seen else -1
if label == 1:
pos_seen += 1
pos_weights += w
else:
neg_seen += 1
neg_weights += w
clf = WeakClassifier(best_feature[0], best_feature[1], best_threshold,
best_polarity)
classifiers.append(clf)
return classifiers

def build_features(self, image_shape):


height, width = image_shape
features = []
for w in range(1, width+1):
for h in range(1, height+1):
i=0
while i + w < width:
j=0
while j + h < height:
#2 rectangle features

36
immediate = RectangleRegion(i, j, w, h)
right = RectangleRegion(i+w, j, w, h)
if i + 2 * w < width: #Horizontally Adjacent
features.append(([right], [immediate]))
bottom = RectangleRegion(i, j+h, w, h)
if j + 2 * h < height: #Vertically Adjacent
features.append(([immediate], [bottom]))
right_2 = RectangleRegion(i+2*w, j, w, h)
#3 rectangle features
if i + 3 * w < width: #Horizontally Adjacent
features.append(([right], [right_2, immediate]))
bottom_2 = RectangleRegion(i, j+2*h, w, h)
if j + 3 * h < height: #Vertically Adjacent
features.append(([bottom], [bottom_2, immediate]))
#4 rectangle features
bottom_right = RectangleRegion(i+w, j+h, w, h)
if i + 2 * w < width and j + 2 * h < height:
features.append(([right, bottom], [immediate, bottom_right]))
j += 1
i += 1
return np.array(features)

def select_best(self, classifiers, weights, training_data):


best_clf, best_error, best_accuracy = None, float('inf'), None
for clf in classifiers:
error, accuracy = 0, []
for data, w in zip(training_data, weights):
correctness = abs(clf.classify(data[0]) - data[1])
accuracy.append(correctness)
error += w * correctness
error = error / len(training_data)

37
if error < best_error:
best_clf, best_error, best_accuracy = clf, error, accuracy
return best_clf, best_error, best_accuracy

def apply_features(self, features, training_data):


X = np.zeros((len(features), len(training_data)))
y = np.array(list(map(lambda data: data[1], training_data)))
i=0
for positive_regions, negative_regions in features:
feature = lambda ii: sum([pos.compute_feature(ii) for pos in positive_regions]) -
sum([neg.compute_feature(ii) for neg in negative_regions])
X[i] = list(map(lambda data: feature(data[0]), training_data))
i += 1
return X, y

def classify(self, image):


total = 0
ii = integral_image(image)
for alpha, clf in zip(self.alphas, self.clfs):
total += alpha * clf.classify(ii)
return 1 if total >= 0.5 * sum(self.alphas) else 0

def save(self, filename):


with open(filename+".pkl", 'wb') as f:
pickle.dump(self, f)

@staticmethod
def load(filename):
with open(filename+".pkl", 'rb') as f:
return pickle.load(f)

38
class WeakClassifier:
def __init__(self, positive_regions, negative_regions, threshold, polarity):
self.positive_regions = positive_regions
self.negative_regions = negative_regions
self.threshold = threshold
self.polarity = polarity

def classify(self, x):


feature = lambda ii: sum([pos.compute_feature(ii) for pos in
self.positive_regions]) - sum([neg.compute_feature(ii) for neg in self.negative_regions])
return 1 if self.polarity * feature(x) < self.polarity * self.threshold else 0

def __str__(self):
return "Weak Clf (threshold=%d, polarity=%d, %s, %s" % (self.threshold,
self.polarity, str(self.positive_regions), str(self.negative_regions))

class RectangleRegion:
def __init__(self, x, y, width, height):
self.x = x
self.y = y
self.width = width
self.height = height

def compute_feature(self, ii):


return ii[self.y+self.height][self.x+self.width] + ii[self.y][self.x] -
(ii[self.y+self.height][self.x]+ii[self.y][self.x+self.width])

def __str__(self):
return "(x= %d, y= %d, width= %d, height= %d)" % (self.x, self.y, self.width,
self.height)

39
def __repr__(self):
return "RectangleRegion(%d, %d, %d, %d)" % (self.x, self.y, self.width, self.height)
def integral_image(image):
ii = np.zeros(image.shape)
s = np.zeros(image.shape)
for y in range(len(image)):
for x in range(len(image[y])):
s[y][x] = s[y-1][x] + image[y][x] if y-1 >= 0 else image[y][x]
ii[y][x] = ii[y][x-1]+s[y][x] if x-1 >= 0 else s[y][x]
return ii

//Building Classifier
from violajones import ViolaJones
import pickle

class CascadeClassifier():
def __init__(self, layers):
self.layers = layers
self.clfs = []
def train(self, training):
pos, neg = [], []
for ex in training:
if ex[1] == 1:
pos.append(ex)
else:
neg.append(ex)
for feature_num in self.layers:
if len(neg) == 0:
print("Stopping early. FPR = 0")
break

40
clf = ViolaJones(T=feature_num)
clf.train(pos+neg, len(pos), len(neg))
self.clfs.append(clf)
false_positives = []
for ex in neg:
if self.classify(ex[0]) == 1:
false_positives.append(ex)
neg = false_positives

def classify(self, image):


for clf in self.clfs:
if clf.classify(image) == 0:
return 0
return 1

def save(self, filename):


with open(filename+".pkl", 'wb') as f:
pickle.dump(self, f)

@staticmethod
def load(filename):
with open(filename+".pkl", 'rb') as f:
return pickle.load(f)

//Training CNN model


import keras
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Dropout
from keras.optimizers import Adam
from keras.callbacks import TensorBoard
import numpy as np

41
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import roc_curve, auc
from sklearn.metrics import accuracy_score
from keras.utils import np_utils
import itertools
data = np.load('/content/drive/MyDrive/Face-recognition-using-CNN-
master/ORL_faces/ORL_faces.npz') 

# load the "Train Images"
x_train = data['trainX']
#normalize every image
x_train = np.array(x_train,dtype='float32')/255
x_test = data['testX']
x_test = np.array(x_test,dtype='float32')/255
# load the Label of Images
y_train= data['trainY']
y_test= data['testY']

# show the train and test Data format
print('x_train : {}'.format(x_train[:]))
print('Y-train shape: {}'.format(y_train))
print('x_test shape: {}'.format(x_test.shape))
x_train, x_valid, y_train, y_valid= train_test_split(
    x_train, y_train, test_size=.05, random_state=1234,)
im_rows=112
im_cols=92
batch_size=512

42
im_shape=(im_rows, im_cols, 1)
x_train = x_train.reshape(x_train.shape[0], *im_shape)
x_test = x_test.reshape(x_test.shape[0], *im_shape)
x_valid = x_valid.reshape(x_valid.shape[0], *im_shape)
print('x_train shape: {}'.format(y_train.shape[0]))
print('x_test shape: {}'.format(y_test.shape))

//model definition
cnn_model= Sequential([
    Conv2D(filters=36, kernel_size=7, activation='relu', input_shape= im_shape),
    MaxPooling2D(pool_size=2),
    Conv2D(filters=54, kernel_size=5, activation='relu', input_shape= im_shape),
    MaxPooling2D(pool_size=2),
    Flatten(),
    Dense(2024, activation='relu'),
     Dropout(0.5),
    Dense(1024, activation='relu'),
    Dropout(0.5),
    Dense(512, activation='relu'),
    Dropout(0.5),
    #20 is the number of outputs
    Dense(20, activation='softmax')  
])
cnn_model.compile(
    loss='sparse_categorical_crossentropy',#'categorical_crossentropy',
    optimizer=Adam(lr=0.0001),
    metrics=['accuracy']
)
cnn_model.summary()
history=cnn_model.fit(
    np.array(x_train), np.array(y_train), batch_size=512,

43
    epochs=250, verbose=2,
    validation_data=(np.array(x_valid),np.array(y_valid)),
)

//plot accuracy-loss graph


print(history.history.keys())
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
predicted =np.array( cnn_model.predict(x_test))
ynew = cnn_model.predict_classes(x_test)
Acc=accuracy_score(y_test, ynew)
print("accuracy : ")
print(Acc)

//generate confusion matrix


cnf_matrix=confusion_matrix(np.array(y_test), ynew)
y_test1 = np_utils.to_categorical(y_test, 20)
def plot_confusion_matrix(cm, classes,
                          normalize=False,

44
                          title='Confusion matrix',
                          cmap=plt.cm.Blues):
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        #print("Normalized confusion matrix")
    else:
        print('Confusion matrix, without normalization')
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)
    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.

    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, format(cm[i, j], fmt),
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")
    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    plt.show()

print('Confusion matrix, without normalization')
print(cnf_matrix)

plt.figure()
plot_confusion_matrix(cnf_matrix[1:10,1:10], classes=[0,1,2,3,4,5,6,7,8,9],
                      title='Confusion matrix, without normalization')

45
plt.figure()
plot_confusion_matrix(cnf_matrix[11:20,11:20], classes=[10,11,12,13,14,15,16,17,18
,19],
                      title='Confusion matrix, without normalization')

print("Confusion matrix:\n%s" % confusion_matrix(np.array(y_test), ynew))
print(classification_report(np.array(y_test), ynew))

46
CHAPTER 9

REFERENCES
This chapter will contains the references of our project
[1] Rajeev Ranjan, Ankan Bansal, Jingxiao Zheng, Hongyu Xu, Joshua Gleason, Boyu
Lu, Anirudh Nanduri,Jun-Cheng Chen, Carlos D. Castillo and Rama Chellappa,” A
Fast and Accurate System for Face Detection, Identification, and Verification”,
Journal Of Latex Class Files,2015.

[2] Mangayarkarasi Nehru and Dr. Padmavathi S ,” Illumination Invariant Face


Detection Using Viola Jones Algorithm”,IEEE TRANSACTION ON ADVANCED
COMPUTING AND COMMUNICATION SYSTEMS,2017.

[3] Xiang Xu and Ioannis A. Kakadiaris, Xu, X., and Kakadiaris, I. A. (2017). “Joint
Head Pose Estimation and Face Alignment Framework Using Global and Local CNN
Features”,IEEE International Conference on Automatic Face & Gesture Recognition,
2017.

[4] Bhandiwad, V and Tekwani, B. ,” Face recognition and detection using neural
networks”, International Conference on Trends in Electronics and Informatics
(ICEI),2017.

[5] Chaudhari, M. N., Deshmukh, M., Ramrakhiani. G and Parvatikar, R,” Face
Detection Using Viola Jones Algorithm and Neural Networks” ,Fourth International
Conference on Computing Communication Control and Automation,2018.

[6] Arne Schumann, Andreas Specker and Jurgen Beyerer “Attribute-based Person
Retrieval and Search in Video Sequences”, International Conference on Advanced
Video and Signal Based Surveillance,2018.

[7] Haofei Wang, Bertram E. Shi and Yiwen Wang,” Convolutional Neural Network
For Target Face Detection Using Single-Trial EEG Signal”, Conference Of The IEEE
Engineering In Medicine And Biology Society ,2018.

47
[8] Wenqi Wu, Yingjie Yin,Xingag Wang and De Xu,“Face Detection With Different
Scales Based on Faster R-CNN ”,IEEE Transactions On Cybernetics,2018.

[9] Sang-Il Choi, Sung-Sin Lee, Sang Tae Choi3 and Won-Yong Shin1, “Face
Recognition Using Composite Features Based on Discriminant Analysis”,IEEE
Access, 2018.

[10] Tiago de Freitas Pereira, , Andre Anjos and Sebastien Marcel,”Heterogeneous


Face Recognition Using Domain Specific Units”,IEEE TRANSACTIONS ON
INFORMATION FORENSICS AND SECURITY,2018.

[11] Chen Yan1 , Zhengqun Wang1 and Chunlin Xu2,“Gentle Adaboost algorithm
based on multifeature fusion for face detection “,IEEE Conference on
Automation,2018.

[12] Yongjun Zhang, Qian Wang, Ling Xiao, Zhongwei Cui,”An Improved Two-Step
Face Recognition Algorithm Based On Sparse Representation”, IEEE Access, 2019.

[13] Yong Li, Jiabei Zeng,Shiguang Shan and Xilin Chen,“Occlusion Aware Facial
Expression Recognition Using CNN With Attention Mechanism”, IEEE
TRANSACTIONS ON IMAGE PROCESSING,2019.

[14] Muhammad Zeeshan Khan, Saad Harous and Saleet Ul Hassan,“Deep Unified
Model For Face Recognition Based on Convolution Neural Network and Edge
Computing”,IEEE Access,2019.

[15] Haonan Chen, Yaowu Chen, Xiang Tian and Rongxin Jianga ,“Cascade Face
Spoofing Detector Based On Face Anti-Spoofing R-CNN And Improved Retinex
LBP” ,IEEE Access, 2019.

[16] Manminder Singh and Ajat Shatru Arora,”Computer Aided Face Liveness
Detection with Facial Thermography “, Wireless Personal Communications,
Springer,2019.

[17] An-Ping Song, Qian Hu, Xue-Hai Ding, Xin-Yi Di, And Z-Heng Song , “Similar
Face Recognition Using The IE-CNN Model ”,IEEE Access,2020.

48
[18] Zuolin Dong, Jiahong Wei, Xiaoyu Chen, Pengfei Zheng,” Face Detection in
Security Monitoring Based on Artificial Intelligence Video Retrieval Technology”,
IEEE Access,2020.

[19] Fatimah Khalid , Noor Amjeed, Rahmita Wirza O.K. Wirza,Hizmawati Madzin
and Illiana Azizan “Face Recognition for Varying Illumination and Different Optical
Zoom  using  a Combination of Binary and Geometric Features”,2020.

[20] Wenyun Sun, Yu Song1, Haitao Zhao, and Zhong Jin, Sun, “A Face Spoofing
Detection Method Based on Domain Adaptation and Lossless Size Adaptation”, IEEE
Access,2020.

49
35

You might also like