Comparative Study of Different Face Recognition
Algorithms
Aayush Shah Maitrik Shah Nihar Shah
Information and Communication Information and Communication Information and Communication
Technology Engineering Technology Engineering Technology Engineering
School of Engineering and Applied School of Engineering and Applied School of Engineering and Applied
Science, Ahmedabad University Science, Ahmedabad University Science, Ahmedabad University
Ahmedabad, India Ahmedabad, India Ahmedabad, India
[email protected] [email protected] [email protected]
Soham Joshi Yagnesh M. Bhadiyadra
Information and Communication Information and Communication
Technology Engineering Technology Engineering
School of Engineering and Applied School of Engineering and Applied
Science, Ahmedabad University Science, Ahmedabad University
Ahmedabad, India Ahmedabad, India
[email protected] [email protected] Abstract—In this paper, we take into account different recognition. Section 2 describes details of work done so far in
approaches like Eigenfaces, Principal Component Analysis(PCA), the area and in Section 3, various methods which are
Support Vector Machines(SVM), Artificial Neural implemented for Face Recognition are described in detail and
Networks(ANN), Convolutional Neural Networks(CNN), K- working methodology of Emotion Recognition through Support
Nearest Neighbour(KNN) for the problem of face recognition and Vector Machines(SVM) and Convolutional Neural
compare all the approaches on the basis of different performance
Networks(CNN) is also elaborated.
metrics such as Accuracy, Number of Iterations and Error Rate to
see which technique is more feasible in real life. Then after we also The comparison results of all the different methods are
recognize emotions in a face using the Support Vector further described in Section 4. The details of the results of
Machines(SVM) and the Convolutional Neural Networks(CNN). Emotion Recognition is also described. Section 5 finally
The approaches we have considered treats Face Recognition and
Emotions Recognition problem as two-dimensional recognition
concludes the paper with future work possibilities and the
problem, the advantage being faces can be described by a small set results overview.
of 2-D characteristics views.
Keywords - Eigenfaces, Principal Component II. RELATED WORK AND MOTIVATION
Analysis(PCA), Support Vector Machines(SVM),
Face Recognition is a very important problem in today’s world
Convolutional Neural Networks(CNN), Face recognition,
and very advanced technologies are being used to address the
Emotions Recognition.
problem. The 3-D Recognition as a security feature is now
I. INTRODUCTION available in smartphones as well. The first solution to the
problem was proposed in 1888 by Francis Galton[2]. He
Face recognition has wide ranges of applications such as collected facial profiles as curves, calculated their norm and
identity authentication, access control, biometrics, and used the difference of the norm and the norm of the standard
surveillance. Research activities in the area of Face Recognition face profile collected as curves and classified input
have increased over the past few years. Emotions Recognition accordingly. This classification resulted in a vector of
also finds many real-life applications in the field of Medicine independent measures that could be compared with the other
Science, Patient Health Monitoring in the health-care vectors in the database. The development in the field of face
department, Marketing, etc. In addition, “the variations between recognition has grown rapidly since the past few years, and
the images of the same face due to illumination and viewing one approach came out in the existence in 1997 was
direction are almost always larger than image variations due to suggesting three types of basic face recognition algorithms:
change in face identity”[1]. This presents a challenge in the frontal, profile and view-tolerant recognition. The first one,
domain of both Face Recognition and Emotions Recognition. frontal recognition is the simple classic approach to face
The central issues are what features to use to represent a face recognition, but better results can be observed using the
and classifying a new face image based on the chosen second way(profiles based approach) which took into account
representation. Various relations(i.e., distance, angles) and some of the physics, geometry, and statistics, and as a single,
other several properties are often used as descriptors of faces for stand-alone system, Profile schemes have a rather marginal
significance for identification[3]. One very important either class to the hyperplane given a set of points belonging to
innovation in the field of face detection was Haar-like two classes. This hyperplane, according to [12], is called
classifier-based approach where the authors selected various Optimal Separating Hyperplane (OSH) which minimizes the
rectangle based approaches and through those face, the chances of misclassifying the examples which are in the
detection was giving significance accuracy up to 93% when training set as well as not in the training set. Reference [13]
enough data was available to train the model[14]. used Support Vector Machines(SVM) with a binary tree
recognition strategy for face recognition problem.
Eigenfaces are one of the most taken-into-account
approaches for this problem. It is known by many names
including Karhunen-Loève Expansion, Eigenpicture,
Eigenvector, and a Principal Component. Eigenfaces are III. IMPLEMENTATION
mathematically principal components of the distribution of the The process of face recognition starts with loading the dataset
faces or the eigenvectors of the covariance matrix of the set of and extracting/detecting the face from the images. Face
images. One of the approaches suggested that “weights” vector detection is an important part of face recognition as the first
for describing a face could be obtained by projecting the face step of automatic face recognition. Because human faces are
onto a standard face picture(eigenpicture)[4, 5]. Reference [6] able to convey many different emotions such as happiness,
used eigenfaces and the technique proposed by Kirby and sadness, interest, excitement, confusion, and intrigue, if you
Sirovich[15] for face detection and recognition. Each face can pay attention to one’s face, you are able to develop an idea for
exactly be represented by the linear combination of eigenfaces. what another person is thinking and what they might do next.
The variation among the faces can be represented by For businesses, all of this information is extremely valuable
eigenvectors. and it can help with understanding how the intended target
audience feels about the brand and its communication at all its
Pentland, B. Moghaddam, and T. Starner extended the work contact points. Also by detecting a face, we can solve the
done with the concept of eigenfaces and eigenvectors with the problem of overfitting where our algorithm starts to memorize
modular approach[7]. They considered various face and not generalize the data by taking the background of
components, such as eye, nose. They created a new eigenspace images. we will remove the background and perform different
containing the above eigenfeatures(i.e., eigeneyes, eigennose, image recognition methods on faces. Face detection has been
and eigenmouth). This system will be less susceptible to done by hog face detector which is widely used face detection
appearance changes as compared to the traditional eigenface model. The model is built out of 5 HOG filters – front looking,
method. This system achieved a recognition rate of left looking, right looking, front looking but rotated left, and a
approximately 95 percent on the FERET database of 7,562 front looking but rotated right. Fastest method on CPU. Works
images of approximately 3,000 individuals. very well for frontal and slightly non-frontal faces. It is a light-
weight model as compared to the haar cascade face detector
Many other algorithms were proposed other than eigenface and DNN face detector. It works under small occlusion. After
based algorithms, Artificial Neural Networks(ANN) was one completing face detection on the available dataset, data
of them. It became popular due to its characteristics of non- augmentation is done by flipping the image, rotating image
linearity. The way through which one constructs the Artificial left and right and adding noise to the image. Data
Neural Network is extremely important for the problem of face augmentation was required as we had a lot of parameters to
recognition. One of the first try to use Artificial Neural train and to achieve high accuracy number of training samples
Networks for facial recognition was single layer adaptive had to be large. After data augmentation, various classifiers
network called WISARD in which a separate network for each and models are used and their performance is compared.
stored individual was constructed[8]. A Convolutional Neural 1. K – Nearest Neighbour Classifier(KNN)
Networks(CNN) based face recognition approach was also
proposed in 1997. With the use of a Decision-based Neural K nearest neighbors is a simple algorithm that stores
all available cases and classifies new cases based on a
Network (DBNN) [9], a Probabilistic Decision-based Neural
Network[PDBNN] was also proposed as a solution for the face similarity measure (e.g., distance functions). Simply
put, the K–NN algorithm classifies unknown data
recognition problem[10].
points by finding the most common class among the
Edge detection was also considered as an approach for face K–closest examples. Each data point in the K-closest
identification. Many smartphone cameras nowadays use examples casts a vote and the category with the most
various kind of edge detection techniques for blur effects and votes wins
better quality of photography. Edge information is insensitive
to illumination effects. An edge detection based approach was
used for face recognition in [11]. “Support Vector
Machines(SVM)” named technique finds the hyperplane that
separates the largest possible fraction of points of the same
class on the same side while maximizing the distance from
Figure 2. The Convolutional Neural Network
(CNN).
The CNN consists of multiple layers. Each layer takes
a multi-dimensional array of numbers as input and
produces another multidimensional array of numbers
as output (which then becomes the input of the next
Figure 1. The principle diagram of the K-NN layer). When classifying images, the input to the first
classification algorithm.
layer is the input image (32 × 32), while the output of
As shown in Figure, the red square represents the the final layer is a set of likelihoods of the different
sample to be classified. It needs to be classified into a categories (i.e., 1 × 1 × 10 numbers if there are 10
blue star or green triangle. It is obvious that it is categories). A simple CNN is a sequence of layers, and
classified to green triangle while k is set to 5 (smaller every layer of a CNN transforms one volume of
circle) since the probability of classifying it into a activations to another through a differentiable
green triangle is 60 %, which is higher than that of function. We have used three main types of layers to
classifying it to blue star (40 %). While k is set to 10 build CNN architectures: Convolution (CONV) Layer,
(bigger circle), the red square is classified into a blue Pooling Layer, and Fully-Connected Layer (exactly as
star, since the probability of classifying it into a blue seen in regular Neural Networks). We have stacked
star is 60 %, higher than the probability of classifying these layers to form a full CNN architecture:
it into a green triangle (40 %). In order to apply the k-
• INPUT [32 × 32] holds the raw pixel values of the
nearest Neighbour classification, we need to define a
image, in this case, an image of width 32, height 32.
distance metric or similarity function. Common
choices include the Euclidean distance: 𝑑(𝑝, 𝑞) = • CONV layer computes the output of neurons that are
√∑𝑁 (𝑞𝑖 − 𝑝𝑖)2 where N is the number of connected to local regions in the input, each computing
𝑖=1
variables, and qi and pi are the values of the ith a dot product between their weights and a small region
variable at points p and q respectively. Other distance they are connected to in the input volume. This may
metrics/similarity functions can be used depending on result in volume such as [32 × 32 × 12] if we decided
the type of data (the chi-squared distance is often used to use 12 filters
for distributions (histograms)). In our case, we have • POOL layer performs a down-sampling operation
been using the Euclidean distance to compare images along the spatial dimensions (width, height), resulting
for similarity in volume such as [16 × 16 × 12].
2. Convolutional Neural Network(CNN) Pooling layer downsamples the volume spatially,
The Convolutional Neural Networks (CNN) are very independently in each depth slice of the input volume.
similar to ordinary Neural Networks. They are made In this example, the input volume of size [224 × 224 ×
up of neurons that have learnable weights and biases. 64] is pooled with filter size 2, stride 2 into output
Each neuron receives some inputs, performs a dot volume of size [112 × 112 × 64]. Notice that the
product and optionally follows it with a non-linearity. volume depth is preserved. The most common
The whole network still expresses a single downsampling operation is the max, giving rise to max
differentiable score function: from the raw image pooling. That is, each max is taken over 4 numbers
pixels on one end to class scores at the other. They still (little 2 × 2 square). In this way, CNN transforms the
have a loss function (e.g. SVM/ Softmax) on the last original image layer by layer from the original pixel
(fully-connected) layer and all the tips/tricks we have values to the final class scores. Note that some layers
developed for learning regular Neural Networks still contain parameters and others don’t. In particular, the
apply CONV/FC layers perform transformations that are a
function of not only the activations in the input volume
but of the parameters (the weights and biases of the
neurons) as well. On the other hand, the RELU/POOL
layers will implement a fixed function. The parameters
in the CONV/FC layers have been trained with to the smaller intrinsic dimensionality of feature
gradient descent so that the class scores that the CNN space (independent variables), which are needed to
computes are consistent with the labels in the training describe the data economically. This is the case when
set for each image. A CNN architecture is in the there is a strong correlation between observed
variables. To generate a set of eigenfaces, a large set
simplest case a list of Layers that transform the image
of digitized images of human faces, taken under the
volume into an output volume (e.g. holding the class same lighting conditions, are normalized to line up
scores): the eyes and mouths. They are then all resampled at
• There are a few distinct types of Layers (e.g. the same pixel resolution (say m×n), and then treated
as mn-dimensional vectors whose components are the
CONV/FC/RELU/POOL are by far the most popular).
values of their pixels. The eigenvectors of the
• Each Layer accepts an input 3D volume and covariance matrix of the statistical distribution of face
transforms it into an output 3D volume through a image vectors are then extracted. Since the
differentiable function. eigenvectors belonging to the same vector space as
face images, they can be viewed as if they were m×n
• Each Layer may (CONV/FC) or may not have images: hence the name eigenfaces. Other eigenfaces
(RELU/POOL) parameters. are hard to categorize and look rather strange. When
properly weighted, eigenfaces can be summed
• Each Layer may (CONV/FC/POOL) or may not have together to create an approximate gray-scale
(RELU) additional hyperparameters. rendering of a human face. Remarkably few
eigenvector terms are needed to give a fair likeness of
3. Principal Component Analysis most people's faces, so eigenfaces provide a means of
Principal components analysis (PCA) is a technique applying data compression to faces for identification
that can be used to simplify a dataset. It is a linear purposes. It is possible not only to extract the face
transformation that chooses a new coordinate system from eigenfaces given a set of weights but also to go
for the data set such that the greatest variance by any the opposite way. This opposite way would be to
projection of set comes to lie on axis ( first principal extract the weights from eigenfaces and the face to be
component), the second on the second axis, and so recognized. These weights tell nothing less, as the
on. PCA can be used for reducing dimensionality in a amount by which the face in question differs from
dataset while retaining those characteristics of the "typical" faces represented by the eigenfaces.
dataset that contribute most to its variance, by Therefore, using these weights one can determine two
keeping lower-order principal components and important things:
ignoring higher-order ones. The idea is that such low- Determine if the image in question is a face at all. In
the case, the weights of the image differ too much
order components often contain the "most important" from the weights of face images (i.e. images, from
aspects of the data. The task of facial recognition is which we know for sure that they are faces) the image
probably is not a face.
discriminating input signals (image data) into several Similar faces (images) possess similar features
classes (persons). The input (eigenfaces) to similar degrees (weights). If one
signals are highly noisy (e.g. the noise is caused by extracts weights from all the images available, the
images could be grouped to clusters. That is, all
differing lighting conditions, pose, etc.), yet the input images having similar weights are likely to be similar
images are not completely random and in spite of faces.
their differences there are patterns which occur in an
input signal. Such patterns, which can be observed in
all signals could be - in the domain of facial 4. Support Vector Machines(SVM)
recognition - the presence of some objects (eyes, Support Vector Machines(SVM) is a discriminative
nose, mouth) in any face as well as relative distances classifier formally defined by a separating
between these objects. These characteristic features hyperplane. In other words, given labeled training
are called eigenfaces in the facial recognition domain data (supervised learning), the algorithm outputs an
(or principal components generally). They can be optimal hyperplane which categorizes new examples.
extracted out of original image data by means of the A Support Vector Machine (SVM) performs
mathematical tool called Principal Component classification by constructing an N-dimensional
Analysis (PCA). By means of PCA, one can hyperplane that optimally separates the data into two
transform each original image of the training set into categories. SVM models are closely related to neural
a corresponding eigenface. original image. If one uses networks. In fact, an SVM model using a sigmoid
all the eigenfaces extracted from original images, one kernel function is equivalent to a two-layer,
can reconstruct the original images from perceptron neural network. Using a kernel function,
the eigenfaces exactly. But one can also use only a SVM’s are an alternative training method for
part of the eigenfaces. Then the approximation of the polynomial, radial basis function, and multi-layer
image. However, losses due to omitting some of the perceptron classifiers in which the weights of the
eigenfaces can be minimized. This happens by network are found by solving a quadratic
choosing only the most important features programming problem with linear constraints, rather
(eigenfaces). The omission of eigenfaces is necessary than by solving a non-convex, unconstrained
due to the scarcity. Thus the purpose of PCA reduces minimization problem as in standard neural network
the large dimensionality of space (observed variables) training. A set of features that describes one case (i.e.,
a row of predictor values) is called a vector. So the
goal of SVM modeling is to find the optimal
hyperplane that separates clusters of the vector in
such a way that cases with one category of the target
variable are on one side of the plane and cases with
the other category are on the other side of the plane.
The vectors near the hyperplane are the support
vectors.
IV. EXPERIMENTAL RESULTS
Here, various kind of figures show the detailed results of face
recognition and accuracy step by step.
Figure 5. Face Recognition using Eigenface-PCA
Figure 3. Augmented Data
Figure 6. Recognized faces using Eigenfaces
Figure 4. Detected Face Using Hog Face
Detector
Figure 7. PCA Model Fitting for Number of
Iterations
Figure 9. Face Recognition using
SVM
Figure 10. Face Recognition
accuracy using CNN
Figure 8. Face Recognition using KNN and
Accuracy calculation.
Figure 10. CNN Face Recognition : correct
labels
Figure 14. CNN Emotion Recognition : correct
labels
Figure 11. CNN Face Recognition : incorrect
labels
Figure 15. CNN Emotion Recognition : incorrect
labels
Figure 16. CNN Emotion Recognition Classification
Figure 12. Emotion Recognition : SVM
Report
Figure 13. The Emotion Recognition accuracy
of CNN
TABLE I.
PERFORMANCE OF DIFFERENT METHODS ON DATASET
FOR FACE RECOGNITION
Method Accuracy
KNN 83.46%
SVM 99.027%
CNN 96.88%
PCA 97.407%
TABLE II.
PERFORMANCE OF DIFFERENT METHODS ON DATASET FOR
EMOTION RECOGNITION
Method Accuracy
SVM 71.206%
CNN 81.32%
Figure 17. Training and validation accuracy
and loss for Face Recognition using
CNN
V. CONCLUSION
Various different models like SVM, KNN, CNN, and PCA were
tested on the dataset obtained by capturing frontal faces of all
the classmates.In face recognition, it was observed that PCA
performs the best because PCA was used to extract features
from the images of the dataset and then the eigenfaces obtained
were trained by an MLP( Perceptron) Classifier using a neural
network.Also, the training and validation graph obtained from
CNN shows that accuracy increases as the number of epochs
increases. As well as the loss decreases as a number of epochs
increases. The same can be observed from the graphs obtained
during emotion recognition from CNN. In emotion recognition
as the number of parameters increased CNN performed the best
Figure 18. Training and validation accuracy for and Dropout layer used in CNN also decreased the chances of
Emotion Recognition using CNN overfitting in the model were reduced.For future works Gabor
filter, algorithm, etc can be used on the models and
performance can be measured.
VI. REFERENCES
[1] P.N. Belhumeur, J.P. Hespanha, and D.J. Kriegman, “Eigenfaces vs.
Fisherfaces: Recognition Using Class Specific Linear Projection,”
European Conf. Computer Vision, 1996, pp. 45-58.
[2] Francis Galton, “Personal identification and description,” In Nature,
pp. 173-177, June 21, 1888.
[3] T. Fromherz, P. Stucki, M. Bichsel, “A survey of face recognition,”
MML Technical Report, No 97.01, Dept. of Computer Science,
University of Zurich, Zurich, 1997.
[4] L. Sirovich and M. Kirby, “Low-Dimensional procedure for the
Figure 19. Training and validation loss for
Emotion Recognition using CNN characterisation of human faces,” J. Optical Soc. of Am., vol. 4, pp.
519-524, 1987.
● Performance of Different Methods On Dataset:- [5] M. Kirby and L. Sirovich, “Application of the Karhunen- Loève
procedure for the characterisation of human faces,” IEEE Trans. [26] Y. Kaya and K. Kobayashi, “A basic study on human face
Pattern Analysis and Machine Intelligence, vol. 12, pp. 831-835, Dec. recognition,” Frontiers of Pattern Recognition, S. Watanabe, ed., pp.
1990. 265, 1972.
[6] M. Turk and A. Pentland, “Eigenfaces for recognition,” J. Cognitive
Neuroscience, vol. 3, pp. 71-86, 1991.
[7] A. Pentland, B. Moghaddam, and T. Starner, “View-Based and
modular eigenspaces for face recognition,” Proc. IEEE CS Conf.
Computer Vision and Pattern Recognition, pp. 84-91, 1994.
[8] T.J. Stonham, “Practical face recognition and verification with
WISARD,” Aspects of Face Processing, pp. 426-441, 1984.
[9] S.H. Lin, S.Y. Kung, and L.J. Lin, “Face recognition/detection by
probabilistic decision-based neural network,” IEEE Trans. Neural
Networks, vol. 8, pp. 114-132, 1997.
[10] S.Y. Kung and J.S. Taur, “Decision-Based neural networks with
signal/image classification applications,” IEEE Trans. Neural
Networks, vol. 6, pp. 170-181, 1995.
[11] B. Takács, “Comparing face images using the modified hausdorff
distance,” Pattern Recognition, vol. 31, pp. 1873-1881, 1998.
[12] V.N. Vapnik, “The nature of statistical learning theory,” New York:
Springverlag, 1995.
[13] G. Guo, S.Z. Li, and K. Chan, “Face recognition by support vector
machines,” In proc. IEEE International Conference on Automatic Face
and Gesture Recognition, pp. 196-201, 2000.
[14] Viola, Paul, and Michael Jones. "Rapid object detection using a boosted
cascade of simple features." CVPR (1) 1 (2001): 511-518.
[15] L. Sirovich and M. Kirby, “Low-Dimensional Procedure for the
characterization of human faces,” J. Optical Soc. of Am., vol. 4, pp.
519-524, 1987.
[16] P.J. Phillips, “Support vector machines applied to face recognition,”
Processing system 11, 1999.
[17] Huang J., X. Shao, and H. Wechsler, "Face pose discrimination using
support vector machines,” 14th International Conference on Pattern
Recognition, (ICPR), Brisbane, Queensland, Aus, 1998.
[18] W. Zhao and R. Chellappa, “SFS based view synthesis for robust face
recognition,” Proc. Int’l Conf. Automatic Face and Gesture
Recognition, pp. 285-292, 2000.
[19] T. Vetter and V. Blanz, “Estimating coloured 3D face models from
fingle images: An example based approach,” Proc. Conf. Computer
Vision (ECCV ’98), vol. II, 1998.
[20] I.J. Cox, J. Ghosn, and P.N. Yianios, “Feature-Based face recognition
using mixture-distance,” Computer Vision and Pattern Recognition,
1996.
[21] S. Tamura, H. Kawa, and H. Mitsumoto, “Male/Female identification
from 8_6 very low resolution face images by neural network,” Pattern
Recognition, vol. 29, pp. 331-335, 1996.
[22] K.K. Sung and T. Poggio, “Learning human face detection in cluttered
scenes,” Computer Analysis of Image and patterns, pp. 432-439, 1995.
[23] G.J. Edwards, T.F. Cootes, and C.J. Taylor, “Face recognition using
active appearance models,” In ECCV, 1998.
[24] John A. Black, M. Gargesha, K. Kahol, P. Kuchi, Sethuraman
Panchanathan,” A Framework for performance evaluation of face
International Journal of Signal Processing Volume 2 Number 2
recognition algorithms,” in Proceedings of the International
Conference on ITCOM, Internet Multimedia Systems II, 2002.
[25] P.J. Phillips, P. Grother, R.J. Michaels, D.M. Blackburn, E. Tabassi,
and M. Bone, “Face recognition Vendor Test 2002: Evaluation
Report,” NISTIR 6965, NAT. Inst. Of Standards and Technology 2003.