Intelligent Security Monitoring System With Video Based Face Recognition-Documentation
Intelligent Security Monitoring System With Video Based Face Recognition-Documentation
Submitted by
LOGESH V S 1807029
NAGALAKSHMI R 1807033
SHANTHINI G 1807046
BACHELOR OF TECHNOLOGY
in
INFORMATION TECHNOLOGY
MARCH 2021
i
COIMBATORE INSTITUTE OF TECHNOLOGY
(A Govt. Aided Autonomous Institution Affiliated to Anna University)
COIMBATORE – 641014
BONAFIDE CERTIFICATE
Prof.N.K.KARTHIKEYAN, Mr.N.SELVAMUTHUKUMARAN,
HEAD OF THE DEPARTMENT, SUPERVISOR,
Department of Information Technology, Department of Information Technology,
Coimbatore Institute of Technology, Coimbatore Institute of Technology,
Coimbatore - 641014. Coimbatore - 641014.
Certified that the candidates were examined by us in the project work viva-vice
examination held on …………………
ii
TABLE OF CONTENTS
ACKNOWLEDGEMENT V
ABSTRACT VI
1 INTRODUCTION 1
2 LITERATURE SURVEY 6
3 SYSTEM ARCHITECTURE 14
4 SYSTEM SPECIFICATION 17
5 DESIGN&IMPLEMENTATION 18
iii
5.2.4 CASCADE CLASSIFIER 20
5.3 PREPROCESSING OF FRAMES 21
6 IMPLEMENTATION 23
6.1 FUNCTION 23
6.1.1 ALGORITHM 23
7 CONCLUSION 31
8 APPENDIX 32
APPENDIX – I
APPENDIX – II
9 REFERENCES 47
ACKNOWLEDGEMENT
iv
Our project “Intelligent Security Monitoring System with video based face
recognition” has been the result of motivation and encouragement from many, whom
we would like to thank.
During the entire period of study, the entire staff members of the Department of
Computer Science and Engineering & Information Technology have offered
ungrudging help. It is also a great pleasure to acknowledge the unfailing help we have
received from our friends.
It is a matter of great pleasure to thank our parents and family members for
their constant support and cooperation in the pursuit of this Endeavour.
ABSTRACT
v
The areas with large flow of peoples like airports, border control areas, etc., would
have frequent emergency situations. So, there requires higher degree of security to
prevent the unwanted coercive change. The security endowed in those conditions is of
traditionally and also the monitoring of crime is difficult with limited man power to
provide with the complete security. Another major issue is that higher volume of
video data brings the complexity in video analysis by a human. The Intelligent video
retrieval technology has become a crucial part of video monitoring and face
recognition has been proven very effective in security critical environment. Hence this
system has been developed to recognize the faces of suspect with Viola-Jones
Algorithm for face detection and it is capable of identifying a person from a video
frame to bring in better accuracy and to establish stability in security. This system also
applies convolution neural network to process the image information from the video to
verify the person. The faces in the surveillance video in real time has been extracted,
recorded and with the use of deep learning model which was built on the basis of
convolution neural network, the Single face and multi face images has been detected
and recognized to effectively assist the security personnel in dealing with the crisis.
The system not only has a high academic value, but also will bring great contribution
to national security, social stability and so on.
vi
LIST OF ABBREVIATIONS
ABBREVIATION EXPANSION
AI Artificial Intelligence
ML Machine Learning
vii
CHAPTER-1
INTRODUCTION
Machine learning is one of the most fascinating technology to have ever been
introduced to. As the name implies, it provides the machine with the opportunity to learn,
1
which makes it more human-like. Machine learning is currently in use, perhaps in far
more
2
places than one would think. We possibly employ a learning algorithm on a regular
basis without even realizing it. It has a variety of applications, including: Web Search
Engine, Photo tagging Applications, Spam Detector, etc.,
3
Face recognition is a technique for recognizing or confirming an individual's
identity by looking at their face. Face recognition system can recognize individuals in
photographs, videos, or in real time. Face recognition system employs computer
algorithms to identify unique, distinguishing features on a person's face. These details,
such as eye distance or chin shape, are then transformed into a mathematical
representation and compared to data from other faces in a face recognition database. A
face prototype is data about a specific face that differs from an image in that it is
structured to only contain those information that can be used to differentiate one face
from another.
Face detection is a crucial step in the process because it identifies and locates
human faces in photographs and videos. The Viola-Jones detection framework
combines the concepts of Haar-like Features, Integral Images, the AdaBoost
Algorithm, and the Cascade Classifier to construct a fast and accurate face detection
system. A Haar-like characteristic is made up of darker and lighter areas. The sum of
light-regions intensities is subtracted from the sum of dark-regions intensities,
yielding a single value Haar-like features enable us to extract useful image
information for processing such as edges, diagonal and straight lines that are suitable
for sensing a face. Since calculating the total of dark/light rectangular regions is
necessary to extract Haar-like features, representing the image as an integral image (as
given in figure 8) cuts down on the time it takes to complete the task and allows for
quick feature evaluation.
4
Fig 1.1 Viola Jones Framework
The main job of the system is to verify whether or not the input image
is that of the claimed individual , with an input image and a name or ID of a person is
given. This process is carried out by convolution neural networks. Convolution
neural networks are a form of neural network that works particularly well for images
because it provides building blocks with less parameters and perceives different items
at each layer of the network.
5
training, ConvNets have the ability to learn these filters/characteristics. The
architecture of a ConvNet is analogous to that of the connectivity pattern of Neurons
in the Human Brain and was inspired by the organization of the Visual Cortex.
Individual neurons respond to stimuli only in a restricted region of the visual field
known as the Receptive Field. A collection of such fields overlap to cover the entire
visual area.
The pre-warning analysis of the captured video images is the key priority of
the video surveillance system in the field of public security, but the post video
analysis wastes a lot of manpower and resources. As a result, facial recognition
technology is being used in the area of public security video surveillance, which will
reduce the risk of criminal activity and preserve social stability. In short, video
surveillance, which is needed for protection, began to shift from ‘‘visible" to
‘‘comprehensible". The versatile nature of face recognition technology, made us
6
aware of the importance in broadening and deepening the face recognition application
level.
CHAPTER -2
LITERATURE SURVEY
7
person by using Viola–Jones object detection framework. The task of the proposed
facial recognition system consists of two steps, the first one was detected the human
face from live video using the web camera in the computer,and the second step
recognizes if this face allowed to enter the building or not by comparing it with the
existing database.Finally, this proposed software system can be used to control access
in smart buildings as a rule and the advancement of techniques connected around
there, Providing a security system is one of the most important features must be
achieved in the smart buildings.
[3] Face Recognition using Content Based Image Retrieval for Intelligent
Security(2019) proposed by Sri karnila and Rio Kurniawan. This paper try to
construct an intelligent security system based on face recognition. The data used in
this research are frontal face images and without obstacles, and facial images with
obstacles. In this research, they used Content Based Image Retrieval or CBIR
method. Approximately 10,000 images used in this work which is collected from
internet, police department office, and shooting directly as primary data. Facial image
data are stored in the database object-based files through process of identification and
facial recognition. Consequently, facial images are retrieved using facial similarity
techniques. In this stage of identification , an application can specify shape of the front
face, performs feature extraction, and running intelligent Similarity (matching face
data) which open the door automatically . This system can be used to minimize the
occurrence of criminality occurs nowadays. This system can be used such as for house
door security, office doors, and airport gates. The experiments result show that our
algorithm quite good in face detection and recognition to open the door.
8
applicable to sparse representation. The fusion scheme not only is easy to use but also
does not need to be artificially set weights. Moreover, it is consistent with the
correlation between the classification error and the score obtained by the experimental
analysis. In the field of face recognition, it has been shown that the two-step face
recognition (TSFR) based on representation using the original training samples and
the generated "symmetric face" training samples can achieve excellent face
recognition performance. Face recognition based on multiplication fusion and TSFR
proposed in this paper can further improve the recognition accuracy.
9
outperform the traditional techniques. In order to validate the efficiency of the
proposed algorithm, a smart classroom for the student's attendance using face
recognition has been proposed. The face recognition system is trained on publically
available labeled faces in the wild (LFW) dataset. The system can detect
approximately 35 faces and recognizes 30 out of them from the single image of 40
students. The proposed system achieved 97.9% accuracy on the testing data.
Moreover, generated data by smart classrooms is computed and transmitted through
an IoT-based architecture using edge computing. A comparative performance study
shows that our architecture outperforms in terms of data latency and real-time
response.
[8]A Fast and Accurate System for face detection, identification, and
verification(2018) proposed by Rajeev Ranjan, Ankan Bansal.In this paper, we
describe a deep learning pipeline for unconstrained face identification and verification
which achieves state-of-the-art performance on several benchmark datasets. We
provide the design details of the various modules involved in automatic face
recognition: face detection, landmark localization and alignment, and face
identification/verification. We propose a novel face detector, Deep Pyramid Single
Shot Face Detector (DPSSD), which is fast and detects faces with large scale
10
variations (especially tiny faces). Additionally, we propose a new loss function, called
the Crystal Loss, for the tasks of face verification and identification. Crystal Loss
restricts the feature descriptors to lie on a hypersphere of a fixed radius, thus
minimizing the angular distance between positive subject pairs and maximizing the
angular distance between negative subject pairs. We provide evaluation results of the
proposed face detector on challenging unconstrained face detection datasets. Then, we
present experimental results for end-to-end face verification and identification on
IARPA Janus Benchmarks A, B and C (IJB-A, IJB-B, IJB-C), and the Janus
Challenge Set 5 (CS5).
11
integrate information from video frames for feature representation effectively and
efficiently. Unlike existing video aggregation methods, our method aggregates raw
video frames directly instead of the features obtained by complex processing. By
combining the idea of metric learning and adversarial learning, we learn an
aggregation network to generate more discriminative images compared to the raw
input frames. Our framework reduces the number of image frames per video to be
processed and significantly speeds up the recognition procedure. Furthermore, low-
quality frames containing misleading information can be well filtered and denoised
during the aggregation procedure, which makes our method more robust and
discriminative. Experimental results on several widely used datasets show that our
method can generate discriminative images from video clips and improve the overall
recognition performance in both the speed and the accuracy for video-based face
recognition and person re-identification.
[11] F-DR Net: face Detection And Recognition In One Net [2018] proposed by
Lei Pang, Yue Ming .It proposed the face multi-task analysis is high-profile in recent
years. Face detection and recognition are more challenging in one net. We present a
new parallel network architecture for two face tasks in one net, achieving end-to-end
face detection and recognition. Firstly, we train a better face detection network. Then,
the selection of the shared layers has a signification impact on the result in speed and
accuracy for recognition, so we determine the optimal shared layers by experiments.
Finally, shared layers contains discriminative information for face recognition, and we
put the recognition network under the shared layer of the detection network. We
achieve parallel end-to-end face detection and recognition in one net,
comprehensively evaluated this method on several face detection and recognition
benchmark datasets, including the Labeled Faces in the Wild (LFW) and Face
Detection Datasets and Benchmark (FDDB). We get better detection and recognition
accuracy on LFW and FDDB, and achieve faster speed compared to other methods.
Our results demonstrate the effectiveness of the proposed approach.
12
from the whole face and extracting the local features from the sub-image have pros
and cons depending on the conditions. In order to effectively utilize the strengths of
various types of holistic features and local features while also complementing each
weakness, we propose a method to construct a composite feature vector for face
recognition based on discriminant analysis. We first extract the holistic features and
local features from the whole face image and various types of local images using the
discriminant feature extraction method. Then, we measure the amount of
discriminative information in the individual holistic features and local features, and
construct composite features with only discriminative features for face recognition.
The composite features from the proposed method were compared with the holistic
features, local features, and others prepared by hybrid methods through face
recognition experiments for various types of face image databases. The proposed
composite feature vector displayed better performance than the other methods
[13] Joint Head Pose Estimation and Face Alignment Framework Using
Global,local CNN Features[2017] Proposed by Xiang Xu and Ioannis A. Kakadiaris
. In this paper, we explore global and local features obtained from Convolutional
Neural Networks (CNN) for learning to estimate head pose and localize landmarks
jointly. Because there is a high correlation between head pose and landmark locations,
the head pose distributions from a reference database and learned local deep patch
features are used to reduce the error in the head pose estimation and face alignment
tasks. First, we train GNet on the detected face region to obtain a rough estimate of the
pose and to localize the seven primary landmarks. The most similar shape is selected
for initialization from a reference shape pool constructed from the training samples
according to the estimated head pose. Starting from the initial pose and shape, LNet is
used to learn local CNN features and predict the shape and pose residuals. We
demonstrate that our algorithm, named JFA, improves both the head pose estimation
and face alignment. To the best of our knowledge, this is the first system that explores
the use of the global and local CNN features to solve head pose estimation and
landmark detection tasks jointly.
13
[14] Face Retrieval in Video Sequences Using a Single Face Sample[2017]
proposed by Bin Liang, d Lihong ,Zheng, Jiwan Han .In this paper Automatic face
retrieval or verification is a matter to identify whether the target person is the same
person, which has been received considerable attention by researchers in computer
vision. This paper proposes a method to localize a face from video sequences by
considering only one shot. First, Cascade AdaBoost is applied to identify region of a
face from the video sequence. The image enhancement step is followed in order to
reduce the illumination variation. Image enhancement considers facial region's local
mean and standard deviation. Later Singular Value Decomposition (SVD) is used to
generate any number of imitated face images for each face by perturbing its n-ordered
images. Consequently, the problem of face retrieval with one sample image becomes a
common face retrieval problem. In addition, we use the extended t-SNE (t-Distributed
Stochastic Neighbour Embedding) to extract condensed facial features.
Correspondingly the computation cost is reduced largely. On the basis of the original
one facial sample and extended training samples, our proposed method shows a better
performance with comparison to other methods. Therefore, the proposed method is
effective and competitive. The proposed method can be easily generalized to other
face related tasks, such as attribute recognition, general object detection, and face
validation.
14
CHAPTER 3
SYSTEM ARCHITECTURE
15
3.1.1. Loading datasets:
The CBCL Face Database includes 2900 facial images and 28,000 non-facial
images, which was used for our face detection model. In this dataset, there were almost
200 people's faces. The dataset used for the face recognition model is CASIA Webface
face database, which includes 10575 facial images of various individuals.
3.1.2. Methodology:
Our proposed system is comprised of two main modules: Face detection (to sense
faces) and Face recognition (to verify faces). When given a video with faces, the
system will attempt to locate the subject's face, identify the subject's information, and
finally output the image whose identity information has been processed. Figure 3.1
depicts the system flow for detection and recognition.
16
3.1.3. Video Fragmentation:
3.1.5. Preprocessing:
Preprocessing refers to all the transformations on the raw data before it is fed to
the machine learning or deep learning algorithm. For instance, training a convolutional
neural network on raw images will probably lead to bad classification performances.
Preprocessing of key frames includes the following steps as frames are converted into
grayscale and the facial region has been cropped from the grayscale frames and the
frames are normalized in order to get similar data distribution.
3.1.6. Training:
The CNN model is trained with victims faces . The preprocessed images are fed
into CNN model . In the images, the facial features such as eyes, nose, mouth is detected
and extracted using convolution neural network which are the keys to distinguish each
face . A unique feature vector for each face is developed which is in the numeric form.
These numeric codes are also called Face print. Each code uniquely identifies the person
among all the others in the training dataset.
17
3.1.7. Face Recognition:
The features would be extracted from the preprocessed frames and then given for
matching. The code (face print) is compared against a database of other face prints. The
database has images need to be compared. This , then identifies a match for the exact
features in the provided database. It returns the image which is being matched with the
label of the image.
18
CHAPTER 4
SYSTEM SPECIFICATION
The hardware and software for the system is selected by considering the factors such as
CPU processing speed, peripheral channel speed, printer speed, seek time, relational
delay of harddisk and communication speed etc. The hardware and software
specifications are as follows.
CHAPTER 5
24
This chapter explains the design and implementation of Rainfall prediction.
1. Video fragmentation
2. Building Viola-Jones model for face detection
3. Preprocessing of detected frames.
4. Training CNN model for face recognition
5. Evaluating the model
The main purpose of video fragmentation is to capture a frame from the input video
and provide the frames to a face detection framework, since it is impossible to identify
faces directly in video. As a result, when a video is given to the system, it gets portioned
(as shown in figure 4) into visually and temporally coherent bits, and a significant key
frame is extracted and given to the framework, which is the frames with faces for each
specified fragment.
Human faces are spotted ( as shown in figure 5) within a frame of input video in
this segment, and high-precision face bounding boxes are returned. The system also
has the capability of storing metadata for each detected face for later use. As a
consequence, the faces in the frames will be returned in this module.
Eyes, nose, mouth, forehead, and other characteristics have been used to
determine whether a picture includes a human face.As a result,the Haar-like features
have been created for those features.Since the face includes both dark and light
regions, summarizing and comparing the pixel values of those regions is a convenient
way to determine which region is lighter or darker for evaluation. The sum of darker-
area pixel values would be less than the sum of lighter-area pixel values . Haar-like
feature can be used to achieve this .A Haar-like feature is created by dividing a
25
rectangular part of an image into multiple sections. They are represented as adjacent
black and white rectangles (shown in figure 6).
Pixel values in Bottom right + Pixel values in top left - Pixel values in top right -
Pixel values in bottom left.
AdaBoost is used in the training phase to pick a subset of features and create the
classifier.A large collection of images is generated, the size of which corresponds to
the detection window's size.This collection must provide both positive(faces) and
negative (nonfaces)examples for the desired filter.Each image has an index of l, where
l = 1...L.A corresponding value yl is assigned to each image.Faces will have yl=1 and
nonfaces have yl=0.
where P– and P+ are the number of nonfaces and faces in the image set.
26
2.Train a classifier hj that can only use a single feature for each feature j.The error rate
of the classifier is calculated in terms of wi,l.
n=l-1
n=1-Ԑ i
Wi+1,j=Wi,l βni
βi= Ԑ i/1- Ԑ i
0, Otherwise
Where n=l-1
27
The main objective of preprocessing the frames is to enhance the facial image data by
removing unnecessary distortions and strengthening some essential image features for
subsequent processing. It is a critical mechanism because it has a direct effect on the
project's success rate.Since data in frames is unclean, this decreases the difficulty of the
data under study. The detected frames has been cropped , converted into gray scale and
finally normalized.
Convolutional Neural Networks (CNNs) are a form of Neural Network that has shown
to be extremely successful in image recognition and classification. In the CNN, there
are four major operations: Non-linearity, Convolution (Relu), Subsampling or pooling
and Classification (Fully Connected Layer). Each and every neural network is built on
the foundation of these operations. An image is a pixel value matrix.So it can be
interpreted as a matrix of pixel values in general. In the case of CNN, the primary
objective of convolution is to extract features from the input image. By learning image
features with small squares of input data, convolution maintains the spatial
relationship between pixels. The facial region will be recognized from the
preprocessed video frames after knowing all these features.
The confusion matrix was generated ( as shown in figure 18) to visualize the
important predictive analytics such as recall, specificity, accuracy, and precision, as
well as to provide the comparisons of values such as True Positives, False Positives,
True Negatives, and False Negatives.It is a N x N matrix for evaluating a
classification model's results, where N is the number of target groups.The matrix
compares the real goal values to the machine learning model's predictions.This
28
provides us with a detailed picture of how well our classification model is doing and
the types of errors it makes.
A loss graph and accuracy graph for training and validation is plotted (as shown
in figure 20). Finally, the model's description is recovered, and the model is saved to
disc. With the deployment of the Viola-Jones model, the prediction is made and the
output (detected face region) is obtained on the video frame after saving the model
(shown in figure 17). The final output (recognized face image) is achieved by
efficiently processing the above-mentioned CNN model and achieved the maximum
accuracy. The overall precision, recall and accuracy values for training the model has
been summarized and evaluated (as shown in figure 21).
29
CHAPTER 6
IMPLEMENTATION
6.1 FUNCTION:
The functioning of all the modules can be well understood by portraying their pseudo
code as follows.
6.1.1 ALGORITHM:
Begin
Step 6: Compare the features of preprocessed frames with that of images in database.
Step 7: Returns the image which is being matched with the label of the image.
End
30
CHAPTER 7
With the rapid development of video monitoring which plays a crucial part in society
for detecting crime in public areas, identity authentication has also become an
indispensable part of people’s life, so people put forward higher requirements on
safety, reliability of identification, detection and accuracy. In this paper, the viola-
jones algorithm has been implemented successfully with the accuracy of 93% to
identify the faces in a video. The data preprocessing steps, such as grayscale
conversion and normalization, are outperformed by the proposed technique. Despite
the fact that the method has a high level of face detection accuracy. The system has
been tested on frontal facial images and could be expanded to recognize rotating faces
in real-time videos .The model is trained on the CASIA Webface face image dataset
to provide additional data that can be used to enhance accuracy.
The confusion matrix was created to provide a visual representation of the CNN
model that had been trained to recognize faces and to compare the real target values
with that of model's predictions.
This system has been developed to be extremely useful and to provide protection at a
greater level in emergency situations. As a result, our system would decreases the
number of crimes happened in high-traffic areas. The proposed system's later on work
would provide a larger number of rotating images in various scales for identification
to make the system work well in different conditions. Furthermore, a new collection
of features may be applied to the features used in this paper in the future to improve
the system's performance. The experiments results show that the proposed algorithm
has higher detection rate and better performance.
31
CHAPTER 8
APPENDIX - I
VIDEO FRAGMENTATION
VIOLA-JONES TRAINING
FACE DETECTION
32
PREPROCESSING
FACE RECOGNITION:
APPENDIX – II
33
8.2 SOURCE CODE
import numpy as np
import math
import pickle
from sklearn.feature_selection import SelectPercentile, f_classif
class ViolaJones:
def __init__(self, T = 10):
self.T = T
self.alphas = []
self.clfs = []
34
features = features[indices]
print("Selected %d potential features" % len(X))
for t in range(self.T):
weights = weights / np.linalg.norm(weights)
weak_classifiers = self.train_weak(X, y, features, weights)
clf, error, accuracy = self.select_best(weak_classifiers, weights, training_data)
beta = error / (1.0 - error)
for i in range(len(accuracy)):
weights[i] = weights[i] * (beta ** (1 - accuracy[i]))
alpha = math.log(1.0/beta)
self.alphas.append(alpha)
self.clfs.append(clf)
print("Chose classifier: %s with accuracy: %f and alpha: %f" % (str(clf),
len(accuracy) - sum(accuracy), alpha))
35
min_error, best_feature, best_threshold, best_polarity = float('inf'), None, None,
None
for w, f, label in applied_feature:
error = min(neg_weights + total_pos - pos_weights, pos_weights + total_neg -
neg_weights)
if error < min_error:
min_error = error
best_feature = features[index]
best_threshold = f
best_polarity = 1 if pos_seen > neg_seen else -1
if label == 1:
pos_seen += 1
pos_weights += w
else:
neg_seen += 1
neg_weights += w
clf = WeakClassifier(best_feature[0], best_feature[1], best_threshold,
best_polarity)
classifiers.append(clf)
return classifiers
36
immediate = RectangleRegion(i, j, w, h)
right = RectangleRegion(i+w, j, w, h)
if i + 2 * w < width: #Horizontally Adjacent
features.append(([right], [immediate]))
bottom = RectangleRegion(i, j+h, w, h)
if j + 2 * h < height: #Vertically Adjacent
features.append(([immediate], [bottom]))
right_2 = RectangleRegion(i+2*w, j, w, h)
#3 rectangle features
if i + 3 * w < width: #Horizontally Adjacent
features.append(([right], [right_2, immediate]))
bottom_2 = RectangleRegion(i, j+2*h, w, h)
if j + 3 * h < height: #Vertically Adjacent
features.append(([bottom], [bottom_2, immediate]))
#4 rectangle features
bottom_right = RectangleRegion(i+w, j+h, w, h)
if i + 2 * w < width and j + 2 * h < height:
features.append(([right, bottom], [immediate, bottom_right]))
j += 1
i += 1
return np.array(features)
37
if error < best_error:
best_clf, best_error, best_accuracy = clf, error, accuracy
return best_clf, best_error, best_accuracy
@staticmethod
def load(filename):
with open(filename+".pkl", 'rb') as f:
return pickle.load(f)
38
class WeakClassifier:
def __init__(self, positive_regions, negative_regions, threshold, polarity):
self.positive_regions = positive_regions
self.negative_regions = negative_regions
self.threshold = threshold
self.polarity = polarity
def __str__(self):
return "Weak Clf (threshold=%d, polarity=%d, %s, %s" % (self.threshold,
self.polarity, str(self.positive_regions), str(self.negative_regions))
class RectangleRegion:
def __init__(self, x, y, width, height):
self.x = x
self.y = y
self.width = width
self.height = height
def __str__(self):
return "(x= %d, y= %d, width= %d, height= %d)" % (self.x, self.y, self.width,
self.height)
39
def __repr__(self):
return "RectangleRegion(%d, %d, %d, %d)" % (self.x, self.y, self.width, self.height)
def integral_image(image):
ii = np.zeros(image.shape)
s = np.zeros(image.shape)
for y in range(len(image)):
for x in range(len(image[y])):
s[y][x] = s[y-1][x] + image[y][x] if y-1 >= 0 else image[y][x]
ii[y][x] = ii[y][x-1]+s[y][x] if x-1 >= 0 else s[y][x]
return ii
//Building Classifier
from violajones import ViolaJones
import pickle
class CascadeClassifier():
def __init__(self, layers):
self.layers = layers
self.clfs = []
def train(self, training):
pos, neg = [], []
for ex in training:
if ex[1] == 1:
pos.append(ex)
else:
neg.append(ex)
for feature_num in self.layers:
if len(neg) == 0:
print("Stopping early. FPR = 0")
break
40
clf = ViolaJones(T=feature_num)
clf.train(pos+neg, len(pos), len(neg))
self.clfs.append(clf)
false_positives = []
for ex in neg:
if self.classify(ex[0]) == 1:
false_positives.append(ex)
neg = false_positives
@staticmethod
def load(filename):
with open(filename+".pkl", 'rb') as f:
return pickle.load(f)
41
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import roc_curve, auc
from sklearn.metrics import accuracy_score
from keras.utils import np_utils
import itertools
data = np.load('/content/drive/MyDrive/Face-recognition-using-CNN-
master/ORL_faces/ORL_faces.npz')
# load the "Train Images"
x_train = data['trainX']
#normalize every image
x_train = np.array(x_train,dtype='float32')/255
x_test = data['testX']
x_test = np.array(x_test,dtype='float32')/255
# load the Label of Images
y_train= data['trainY']
y_test= data['testY']
# show the train and test Data format
print('x_train : {}'.format(x_train[:]))
print('Y-train shape: {}'.format(y_train))
print('x_test shape: {}'.format(x_test.shape))
x_train, x_valid, y_train, y_valid= train_test_split(
x_train, y_train, test_size=.05, random_state=1234,)
im_rows=112
im_cols=92
batch_size=512
42
im_shape=(im_rows, im_cols, 1)
x_train = x_train.reshape(x_train.shape[0], *im_shape)
x_test = x_test.reshape(x_test.shape[0], *im_shape)
x_valid = x_valid.reshape(x_valid.shape[0], *im_shape)
print('x_train shape: {}'.format(y_train.shape[0]))
print('x_test shape: {}'.format(y_test.shape))
//model definition
cnn_model= Sequential([
Conv2D(filters=36, kernel_size=7, activation='relu', input_shape= im_shape),
MaxPooling2D(pool_size=2),
Conv2D(filters=54, kernel_size=5, activation='relu', input_shape= im_shape),
MaxPooling2D(pool_size=2),
Flatten(),
Dense(2024, activation='relu'),
Dropout(0.5),
Dense(1024, activation='relu'),
Dropout(0.5),
Dense(512, activation='relu'),
Dropout(0.5),
#20 is the number of outputs
Dense(20, activation='softmax')
])
cnn_model.compile(
loss='sparse_categorical_crossentropy',#'categorical_crossentropy',
optimizer=Adam(lr=0.0001),
metrics=['accuracy']
)
cnn_model.summary()
history=cnn_model.fit(
np.array(x_train), np.array(y_train), batch_size=512,
43
epochs=250, verbose=2,
validation_data=(np.array(x_valid),np.array(y_valid)),
)
44
title='Confusion matrix',
cmap=plt.cm.Blues):
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
#print("Normalized confusion matrix")
else:
print('Confusion matrix, without normalization')
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=45)
plt.yticks(tick_marks, classes)
fmt = '.2f' if normalize else 'd'
thresh = cm.max() / 2.
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, format(cm[i, j], fmt),
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
plt.show()
print('Confusion matrix, without normalization')
print(cnf_matrix)
plt.figure()
plot_confusion_matrix(cnf_matrix[1:10,1:10], classes=[0,1,2,3,4,5,6,7,8,9],
title='Confusion matrix, without normalization')
45
plt.figure()
plot_confusion_matrix(cnf_matrix[11:20,11:20], classes=[10,11,12,13,14,15,16,17,18
,19],
title='Confusion matrix, without normalization')
print("Confusion matrix:\n%s" % confusion_matrix(np.array(y_test), ynew))
print(classification_report(np.array(y_test), ynew))
46
CHAPTER 9
REFERENCES
This chapter will contains the references of our project
[1] Rajeev Ranjan, Ankan Bansal, Jingxiao Zheng, Hongyu Xu, Joshua Gleason, Boyu
Lu, Anirudh Nanduri,Jun-Cheng Chen, Carlos D. Castillo and Rama Chellappa,” A
Fast and Accurate System for Face Detection, Identification, and Verification”,
Journal Of Latex Class Files,2015.
[3] Xiang Xu and Ioannis A. Kakadiaris, Xu, X., and Kakadiaris, I. A. (2017). “Joint
Head Pose Estimation and Face Alignment Framework Using Global and Local CNN
Features”,IEEE International Conference on Automatic Face & Gesture Recognition,
2017.
[4] Bhandiwad, V and Tekwani, B. ,” Face recognition and detection using neural
networks”, International Conference on Trends in Electronics and Informatics
(ICEI),2017.
[5] Chaudhari, M. N., Deshmukh, M., Ramrakhiani. G and Parvatikar, R,” Face
Detection Using Viola Jones Algorithm and Neural Networks” ,Fourth International
Conference on Computing Communication Control and Automation,2018.
[6] Arne Schumann, Andreas Specker and Jurgen Beyerer “Attribute-based Person
Retrieval and Search in Video Sequences”, International Conference on Advanced
Video and Signal Based Surveillance,2018.
[7] Haofei Wang, Bertram E. Shi and Yiwen Wang,” Convolutional Neural Network
For Target Face Detection Using Single-Trial EEG Signal”, Conference Of The IEEE
Engineering In Medicine And Biology Society ,2018.
47
[8] Wenqi Wu, Yingjie Yin,Xingag Wang and De Xu,“Face Detection With Different
Scales Based on Faster R-CNN ”,IEEE Transactions On Cybernetics,2018.
[9] Sang-Il Choi, Sung-Sin Lee, Sang Tae Choi3 and Won-Yong Shin1, “Face
Recognition Using Composite Features Based on Discriminant Analysis”,IEEE
Access, 2018.
[11] Chen Yan1 , Zhengqun Wang1 and Chunlin Xu2,“Gentle Adaboost algorithm
based on multifeature fusion for face detection “,IEEE Conference on
Automation,2018.
[12] Yongjun Zhang, Qian Wang, Ling Xiao, Zhongwei Cui,”An Improved Two-Step
Face Recognition Algorithm Based On Sparse Representation”, IEEE Access, 2019.
[13] Yong Li, Jiabei Zeng,Shiguang Shan and Xilin Chen,“Occlusion Aware Facial
Expression Recognition Using CNN With Attention Mechanism”, IEEE
TRANSACTIONS ON IMAGE PROCESSING,2019.
[14] Muhammad Zeeshan Khan, Saad Harous and Saleet Ul Hassan,“Deep Unified
Model For Face Recognition Based on Convolution Neural Network and Edge
Computing”,IEEE Access,2019.
[15] Haonan Chen, Yaowu Chen, Xiang Tian and Rongxin Jianga ,“Cascade Face
Spoofing Detector Based On Face Anti-Spoofing R-CNN And Improved Retinex
LBP” ,IEEE Access, 2019.
[16] Manminder Singh and Ajat Shatru Arora,”Computer Aided Face Liveness
Detection with Facial Thermography “, Wireless Personal Communications,
Springer,2019.
[17] An-Ping Song, Qian Hu, Xue-Hai Ding, Xin-Yi Di, And Z-Heng Song , “Similar
Face Recognition Using The IE-CNN Model ”,IEEE Access,2020.
48
[18] Zuolin Dong, Jiahong Wei, Xiaoyu Chen, Pengfei Zheng,” Face Detection in
Security Monitoring Based on Artificial Intelligence Video Retrieval Technology”,
IEEE Access,2020.
[19] Fatimah Khalid , Noor Amjeed, Rahmita Wirza O.K. Wirza,Hizmawati Madzin
and Illiana Azizan “Face Recognition for Varying Illumination and Different Optical
Zoom using a Combination of Binary and Geometric Features”,2020.
[20] Wenyun Sun, Yu Song1, Haitao Zhao, and Zhong Jin, Sun, “A Face Spoofing
Detection Method Based on Domain Adaptation and Lossless Size Adaptation”, IEEE
Access,2020.
49
35