Alphabet Recognition of Sign Language using
Machine Learning
Avinash Kumar Sharma Abhyudaya Mittal Aashna Kapoor
Computer Science and Engineering Computer Science and Engineering Computer Science and Engineering
ABES Institute of Technology ABES Institute of Technology ABES Institute of Technology
Ghaziabad, India Ghaziabad, India Ghaziabad, India
avinashsharma2006@[Link] mittalabhyudaya@[Link] aashnak990@[Link]
Aditi Tiwari Ghaziabad, India
Computer Science and Engineering tiwari.aditi2110@[Link]
ABES Institute of Technology
Abstract - One of the major issues that our society is dealing with and orientations, as well as face expressions. Hearing loss
is the difficulty that people with disabilities have in sharing their affects roughly 466 million people globally, with 34 million
feelings with normal people. People with disabilities can of them being children. Individuals who are labelled as
communicate through sign (gesture) languages. This project "deaf" have very limited or no hearing capabilities.
aims to design a model which can recognize sign language
alphabets (hand gestures) and convert it into text and sound Only a small percentage of the population is aware of sign
using machine learning approach. The main goal of this project language. It is also not an international language, contrary to
to break down barriers to communication between people with common perception. Obviously, this makes communication
disabilities and the rest of society. The performance of this between the Deaf population and the hearing majority even
method is evaluated on publicly available ISL dataset. In our more difficult. Because the Deaf community is generally
project, which is based on Convolution Neural Networks, we less adept in writing a spoken language, the option of
used the Inception V3 deep learning model for image written communication is inconvenient [3]. Hearing or
classification. Hand gestures are captured as images by the speech problems affect about 0.05 percent of the world's
webcams in this project, and the defined model aids in the population, according to the United Nations Statistics
recognition of the corresponding alphabet for hand gestures. Division. The disability was present in 63 percent of these
We have tried to overcome the existing limitations in Sign patients at birth, whereas the rest acquired it as a result of an
Language Recognition and to increase the efficiency. accident. According to JICA disability statistics, hearing
impairments account for 8.36 percent of all disabilities in
Keywords — Sign Language, Hand Gesture, Gesture India, while speech difficulties account for 5.06 percent. For
Recognition, Human Computer Interaction, Sign Language a deaf population of around 7 million individuals in India,
Recognition.
there are only about 250 competent sign language
interpreters [4]. A department of Persons with Disabilities
I. INTRODUCTION Empowerment is part of the Ministry of Social Justice and
Empowerment, and it deals with policies for people with
Communication is an indispensable tool for human
disabilities. The ISLRTC of the ministry is in charge of deaf
existence which is a basic and effective way to share
and dumb schools [5].
thoughts, feelings and opinion, but a significant portion of
the world's population is lacking this ability. Hearing loss, It has ties to a number of Indian schools. Aside from that,
speech disability, or both affect a large number of people. many groups, such as the Indian Deaf and Dumb Society, try
Hearing loss is defined as a partial or total inability to hear to give various forms of assistance on their own. Schools for
in one or both ears. Mute, on the other hand, is a disability the deaf and dumb rely primarily on verbal communication
that prevents people from speaking and makes them unable due to a dearth of skilled teachers. This is the state of large
to communicate. If a child becomes deaf-mute throughout centers, and when it comes to rural areas, there are no
childhood, their capacity to learn languages is hampered, institutions or assistance for the deaf and dumb. As a result,
resulting in language impairment, also known as hearing residents in these locations are experiencing severe
mutism. We discovered that those who are unable to psychological distress and feel utterly cut off from the rest
communicate verbally and have hearing impairments have of the world. Even when they reach adulthood, they remain
difficulty in ordinary communication, and that this hearing reliant on their relatives, or they struggle to make ends meet
or speech disability results in a shortage of equal because they are unable to find suitable work.
opportunity for them [1] [2].
The biggest problem is that fit people are either unwilling to
People who are deaf or deaf-blind use sign language as a learn sign languages or find them difficult to remember.
means of communication. A sign language is made up of a Researchers have tried a variety of ways for recognizing
variety of gestures made up of diverse hand shapes, motions, diverse hand gestures in order to allow normal people to
comprehend sign languages and to eliminate barriers in our
society for individuals with disabilities. With the
advancement of modern technology, we can find a variety of
ways to integrate these people into society. The availability
of sensors, cameras, and AI technologies like as deep
learning, CNN, ANN, and speech to voice programs, as well ASL, like all languages, is a living language, it evolves
as speech to text programs, has opened the way for the with time. Many high schools, colleges, and universities
development of useful gadgets. We can undoubtedly make in the United States accept it as a modern and "foreign"
significant progress in engaging with people with the help of language requirement for academic degrees.
these new technologies.
C. Indian Sign Language (ISL)
A. Sign Language and Gestures
Indian Sign Language (ISL) is India's most commonly
To visually transmit sign patterns that convey meaning
used sign language; it is referred to as the mother
in sign language, a sequence of facial expressions, an
tongue in some metropolitan regions due to its
orientation, hand shapes, and hand and body movement
are used. Hand gestures are very crucial for deaf and widespread use. ISL is a collection of authentic sign
mute people who use Sign Language to communicate languages that have grown over time and are widely
with the outside world. Sign language has been shown used as shown in Figure 2 [7].
to be useful in communicating a wide range of needs,
from basic necessities to complex concepts. There are India's sign language is very scientific, with its own
three types of sign languages which are as follows: grammar.
i) Fingerspelling one letter at a time
ii) Sign vocabulary at the word level: commonly
used communication words
iii) Non-manual characteristics include whole
body movements, including facial expressions
and body position
There are a variety of sign languages used in different
countries. The most popular and frequently used among
them is American Sign Language.
B. American Sign Language (ASL)
American Sign Language (ASL) is a complete, natural
language with linguistic features similar to spoken Figure 2: Indian Sign Language
languages and a grammar distinct from English. Hand
and face movements are used to express ASL as shown ISL is divided into two categories: manual and non-
in Figure 1. It is the predominant language of many deaf manual. Figure 2 depicts the situation.
and hard-of-hearing North Americans, as well as some
i) Manual: It's possible to accomplish it with
hearing persons. It contains its own rules for
one or both hands.
pronunciation, word creation, and word order, as well as
ii) Non-manual: Facial expressions can be
all of the other basic characteristics of language.
used.
Although the precise roots of ASL are uncertain, some
suggest that it developed more than 200 years ago from II. SYSTEM OVERVIEW
the blending of local sign languages and French Sign
Language [6]. The goal is to create a system that can recognise and classify
sign language motions from datasets that have been recorded
as shown in Figure 3. The suggested framework is based on
the Inception v3 model, which is a widely used image
recognition model that has been reported to attain an
accuracy of 98.99 percent for the American Sign Language.
Figure 1: American Sign Language
IV. RELATED WORK
A. Deep Learning Model
With deep learning's outstanding successes in the field
of computer vision in recent years, it has been
demonstrated that the method based on deep learning
has several advantages, including rich feature
extraction, powerful modelling capacity, and intuitive
training [39]. Deep learning model is a model which is
based on deep learning which Deep learning is an area
of machine learning that deals with artificial neural
networks, which are algorithms inspired by the structure
Figure 3: System Overview and function of the brain.[12]. Deep learning is a critical
component of self-driving automobiles, allowing them
III. LITERARY SURVEY to detect a stop sign or discriminate between a
An examination of the literature for the proposed framework pedestrian and a lamppost. It enables voice control in
reveals that numerous attempts have been made to address consumer electronics such as phones, tablets,
sign recognition in videos and images using various methods televisions, and hands-free speakers. Deep learning has
and algorithms. gotten a lot of press recently, and with good cause. It's
The ability to effectively communicate one's thoughts with accomplishing accomplishments that were previously
others is a major challenge for someone with a hearing and unattainable.
speaking disability. Because most people are uninterested in
A computer model learns to execute categorization
learning sign languages, there is a pressing need to develop
tasks directly from images, text, or sound in deep
a system for communicating with people who are deaf or
learning. Natural language understanding is one area
hard of hearing. Several methods have been proposed and
where deep learning is predicted to make a substantial
several devices have been invented for this purpose in
impact in the future years. We expect systems that use
today's world. We've looked at a few of them in this paper.
RNNs to comprehend words or full texts to improve
Every possible solution is intended to convert sign language
considerably when they acquire strategies for
gestures into text and voice.
selectively attending to one part at a time [31] [32].
The proposed solutions divide the gesture recognition
Deep learning models can attain state-of-the-art
process into three stages. Input, processing, and output are
accuracy, even surpassing human performance in some
the three components. These stages can be achieved using
cases. A vast set of labelled data and neural network
two different methodologies that have been proposed in
topologies are used to train models.
various research papers. Image processing and machine
learning are used in the first, while sensors and B. Convolutional Neural Network (CNN)
microcontrollers are used in the second. Both methods have
their own set of benefits and drawbacks, but they are highly Most state-of-the-art computer vision solutions for a
effective in the communication process. wide range of problems use convolutional networks
Based on the above papers, we can conclude that the image- [22]. The model which we have used uses
based method is inexpensive and portable, and that it can be Convolutional Neural Network which is one of the most
used at any time. The main issues are with image processing often used deep neural networks (CNN). Convolutional
in a variety of light intensities and backgrounds. This system Neural Networks had its origins in the neocognitron46,
is less efficient than the glove-based system, but it is less which had a similar architecture but no end-to-end
expensive. The primary benefit of a sensory-based approach supervised-learning mechanism like backpropagation.
is that gloves can directly acquire data (degree of bend, wrist For the recognition of phonemes and simple phrases, a
orientation, hand motion, etc.) in terms of computing device primitive 1D CNN dubbed a time-delay neural net was
voltage values, eliminating the need to process raw data into utilized [29][30]. Convolution is a mathematical linear
meaningful values. Environmental factors such as lighting action between matrices that gives it its name.
and background are also unimportant, but the approach's Convolutional layer, non-linearity layer, pooling layer,
biggest drawbacks are its high cost and the requirement to and fully-connected layer are some of the layers of
wear the glove at all times. CNN. The pooling and non-linearity layers do not have
Both methodologies produce accurate results, and while parameters, but the convolutional and fully-connected
both have room for improvement, they can both help to layers have. In machine learning issues, the CNN
achieve the goal of communicating with people who have performs admirably [13]. As a result, CNNs can have a
hearing and speech disabilities. The methodologies proposed wide range of designs, which are mostly determined by
in this project have the potential to produce positive results. the task. The task could include picture classification,
It attempts to overcome the various challenges of various multiple class segmentation, or the localization of
methodologies for gesture recognition. In most cases, the particular objects within a scene [26]. CNNs are
results obtained are extremely accurate, with over 98 percent typically employed to solve tough image-driven pattern
accuracy. recognition tasks, and their exact yet simple architecture
makes getting started with ANNs a lot easier [14].
C. Transfer Learning
Transfer learning has exploded in popularity since it As regularizes, it employs auxiliary
drastically cuts training time and takes far less data to Classifiers.
improve performance. Transfer learning tries to
improve target learners' performance on target domains The reason that we chose inception v3 as our model for this
by transferring information from various but related project was that the inception V3 model is simply an
source domains [27]. There are numerous instances in improved version of the inception V1 model. For greater
knowledge engineering when transfer learning can be model adaption, the Inception V3 model uses a number of
quite advantageous. Web document classification [33] approaches to optimize the network
[34] is an example.
V. PROPOSED SYSTEM
Transfer learning approaches have recently been
The proposed system's design consists of the five phases
successfully employed in a variety of real-world
listed below, as well as a flow diagram depicting the steps
applications. Raina et al. [35] and Dai et al. [36] [37]
required is shown in the figure 4 below –
proposed that text data be learned across domains using
transfer learning approaches. The inception v3 is a
Database Collection
model which we have used in our project, this model
Training of Model
includes transfer learning which means that the model
Pre-Processing and Hand Segmentation
was trained on millions of photos using extremely high
Feature Extraction
computational power, which is difficult to achieve from
Classification
scratch [15]
Text to Speech Conversion
Our approach was to use this model which was already
trained on images using high computational power and
then training it as per our requirement and objective by
providing the particular datasets of images and the
results were just excellent.
D. Inception Model
There are two types of inception models –
Inception V1 - Overfitting of the data occurred
when numerous deep layers of convolutions
were utilised in a model. To avoid this, the
conception V1 model employs the concept of
numerous filters of varying sizes on the same
level. As a result, instead of having deep layers
in our inception models, we have parallel
layers, making our model larger rather than
deeper.
Inception V3
The Inception-v3 model outperforms
GoogleNet (Inception-v1) in terms of object
recognition [38]. The Inception-v3 model is
made up of three components: a basic
convolutional block, an upgraded Inception
module, and a classifier. Inception-v3 is a 48- Figure 4: System Flowchart
layer deep convolutional neural network. You
can load a pre-trained version of the network
from the ImageNet database, which has been In the following sub-sections of this section, each of the
trained on over a million photos. The goal of steps is detailed.
factorizing convolutions in Inception v3 is to
reduce the number of connections/parameters A. Database Collection
while maintaining network efficiency [40]. The database collection is an important part of
every project as with the help of the database we
Features of Inception V3 train our model and to ensure the accuracy of the
The following are the features of Inception v3 model it is very important to collect the database
model – form a reliable source. In this project as we have
It is more efficient. worked on Indian Sign Language and American
It has a more extensive network than Sign Language, we have collected the dataset from
the Inception V1 and V2 models, but Kaggle is the largest data science community on
its speed is unaffected. the planet, with a wealth of tools and services to
It is less expensive in terms of assist you in achieving your data science objectives
computing. [8].We have used two types of data sets, one for the
Indian Sign Language and the other for the vision around the world while also driving
American Sign Language. For each alphabet we a demand for ever more powerful
have used around 1200 images to train our model computers for Intel[20]. OpenCV is a
in order to achieve better accuracy and efficiency. large open-source computer vision,
So, if we consider that there are 26 alphabets in machine learning, and image processing
total, and each alphabet has 1200 data sets, we can library. Python can process the OpenCV
see that 26x1200 is a massive dataset that can aid array structure for analysis when it is
us in training our model in a more efficient manner, combined with various libraries such as
allowing us to attain higher accuracy and because NumPy. We use vector space and perform
the signer's hands are in different positions and mathematical operations on these features
orientations for the same sign, the system is more to identify image patterns and their
flexible. various features [16].
Segmentation
B. Training of Model The process of converting an image into
The model we have used in order to achieve our small segments in order to extract more
objective is Inception V3 model. The Inception V3 accurate image attributes is known as
is a deep learning model for image classification segmentation. If the segments are properly
that uses Convolutional Neural Networks. The autonomous (two segments of an image
Inception V3 is a more advanced version of the should not have any identical
fundamental model Inception V1, which was first information), then the image's
released in 2014 as GoogLeNet. It was created by a representation and description will be
Google team, as the name implies [11]. accurate, whereas the result of rugged
segmentation will be inaccurate.[19]
The algorithm for training the model - Before extracting the features from the input image, a series
Step 1. Load the dataset through the url in the of operations are performed on it to ensure that high-quality
working folder features are extracted. Threshold-based segmentation is used
Step 2. Read the dataset images in a variable for hand segmentation [18].
Step 3. Set the training data size to 80 percent and
testing data size to 20 percent. D. Feature Extraction
Step 4. Divide the images randomly to create In image classification, feature extraction is a
training and testing samples. crucial step. It provides for the most accurate
Step 5. Perform image transformation by cropping representation of image content [21]. Feature
images, random rotation and normalization. extraction divides and organises a large collection
Step 6. Check the valid images after the of raw data as part of the dimensionality reduction
transformation. process. Classes have been reduced to smaller,
Step 7. Iterate through the training images and easier-to-manage groups as a result, processing
check various label classes. would be more straightforward.
Step 8. Set class names as labels to be predicted as
output.
Step 9. Set number of epoch (training cycles) to 25.
Step 10. Perform the training of model in each
cycle to increase the accuracy in each step.
Step 11. Calculate the accuracy and training loss by
testing the model.
Step 12. Save the generated model in local storage.
C. Pre-processing and Hand Segmentation
This is the first step in the pre-processing process.
This is the image sensing process. To remove
noise, each picture frame is pre-processed. Open
CV is used to start the camera module in this
project. When cameras open to capture the image
of hand gesture, we can see the rectangular box
which helps to detect the hand gesture without any Figure 5 : Processing of Images
background noise.
Open CV Various image pre-processing techniques,
Gary Bradski founded OpenCV at Intel in including as binarization, thresholding, scaling, and
1999 with the goal of speeding up research normalisation, are applied to the sampled image
and commercial applications of computer before getting features as shown in Figure 5.
Following that, feature extraction techniques are
used to extract features that can be used to classify
and recognise images [25]. The large number of
variables is the most crucial feature of these
massive data sets. Processing these variables
necessitates a significant amount of computing
power. As a result, by selecting and combining
variables into functions, function extraction aids in
the extraction of the best feature from large data
sets. Minimising the amount of data these features
are straightforward to use while accurately and Alphabet - C
uniquely describing the data collection process.
In this project feature extraction enables the model
to detect the hand gestures without any background
noise. Form, contour, geometrical feature (position,
angle, distance, etc.), colour feature, histogram, and
other predefined features are extracted from pre-
processed images and used later for sign
classification or recognition [17].
E. Classification
Many picture categorization models have been Alphabet - W
developed to aid in the resolution of the most
pressing issue of identification accuracy. Image
categorization is a key subject in the field of
computer vision, having a wide range of practical
applications [28]. We have used transfer learning
mechanism to train our model. Inception V3 Model
is used in this project, which is an image classifier
model which works on CNN (Convolutional Neural
Network) and It is pre-trained on a very large data.
So, by transfer learning we mean that we have
trained the existing inception V3 model on our
target dataset of sign languages. Now, we have Alphabet – B
used this alphabet recognition model to predict the
Figure 6: Output Labels
various labels of sign languages. The predict
function takes user image as input and map it to F. Text to Speech Conversion
correct label according to the trained model. Finally
the correct label is returned as output as shown in Speech is one of the most ancient and natural ways
Figure 6. for humans to share information. Throughout the
years [23]. The process of turning words into a
vocal audio form is known as text-to-speech (TTS).
The programme, tool, or software takes a user's
input text and, using natural language processing
methods, deduces the linguistics of the language
and does logical inference on it. This processed text
is then sent to the next block, which performs
digital signal processing on it. This processed text
is then translated into a voice format using a variety
of techniques and transformations. Speech is
synthesised throughout the entire procedure.
Alphabet - R In this project for converting the text into speech
we have used gTTS module. Google Text-to-
Speech (gTTS) is a Python library and command-
line utility for interacting with the Google Translate
text-to-speech API [10]. The gTTS library, which
can be used for voice translation, will be imported
from the gTTS module [9].
The text-to-speech (TTS) synthesis process is
divided into two stages. The first is text analysis, in
Alphabet - O which the input text is converted into a phonetic or
other linguistic representation, and the second is
speech waveform generation, in which the output is
generated using this phonetic and prosodic
information. The terms "high-level synthesis" and
"low-level synthesis" are commonly used to
describe these two phases [24].
Other languages, such as French, German, and
Hindi, can also benefit from the gTTS module.
When there is a communication barrier and the user
is unable to communicate his messages to others,
this is highly useful. Text-to-speech is a wonderful
benefit to those who are visually impaired or have
other disabilities since it may assist them with text-
to- speech translation. The gTTS module can also
be used for other languages, which opens up a lot Figure 8: Accuracy and Test Loss Graph
of possibilities.
VII. CONCLUSION & FUTURE SCOPE
i) Features of GTTS In this project, we have observed that using transfer learning
Customizable speech-specific sentence is very efficient approach. We have used the pre trained
tokenizer that can read any length of text Inception V3 model which is based on CNN and Deep
while maintaining proper intonation, Neural Network algorithms and trained it on sign language
abbreviations, decimals, and other features dataset with 3000 images per alphabet. This large dataset
and the text pre-processors that can be helped us to achieve greater accuracy for sign language
customised to give features such as recognition.
pronunciation using GTTS library as
shown in Fig 7.
Figure 7: Text to Speech part
VI. RESULTS Figure 9: Select Alphabet Accuracy comparison
The inception v3 model gave excellent result in classifying
The problem encountered in previous work like less
the sign language gestures and the accuracy we have
accuracy on some selected single hand alphabets is also
achieved for the American Sign Language is 98.99% and
solved as we have achieved similar accuracy for every
Training loss of 1.46% as shown in Figure 8.
alphabet as shown in Fig 9.
Also, there was a drawback that on including the letters such
The future scope of this project is to achieve the same
as {C, L, M, N, R, U, Y} the researchers were not getting a
accuracy while recognizing words and sentences along with
good accuracy is solved in our project. We have achieved
the development of an mobile based application to be
the above-mentioned accuracy including these 7 letters {C,
installed on portable devices like smart watches or mobile
L, M, N, R, U, Y} that is total 26 letters. Including the
phones so that people can freely use it in their daily lives.
letters C, L, M, N, R, U, and Y, we tested our method on all
26 alphabets and achieved 99.99% accuracy with American
Sign Language.
VIII. REFERENCES
[1] Liang R-H, Ouhyoung M (1998) A real-time continuous gesture
recognition system for sign language. In: IEEE International Conference on
Automatic Face and Gesture Recognition, 1998. Proceedings. Third. IEEE,
pp 558–567
[2] Liang R-H (1997) Continuous gesture recognition system for taiwanese
sign language. National Taiwan University
[3] Pigou L., Dieleman S., Kindermans PJ., Schrauwen B. (2015) Sign
Language Recognition Using Convolutional Neural Networks. In: Agapito
L., Bronstein M., Rother C. (eds) Computer Vision - ECCV 2014 [35] R. Raina, A. Y. Ng, and D. Koller, “Constructing informative priors
Workshops. using transfer learning.” in Proceedings of the 23rd International
[4] Starner T, Weaver J, Pentland A (1998) Real-time American sign Conference on Machine Learning, Pittsburgh, Pennsylvania, USA, June
language recognition using desk and wearable computer based video. IEEE 2006, pp. 713–720
Trans Pattern Anal Mach Intell 20(12):1371–1375 [36] W. Dai, G. Xue, Q. Yang, and Y. Yu, “Co-clustering based
[5] Vogler C, Metaxas D (1997) Adapting hidden markov models for asl classification for out-of-domain documents,” in Proceedings of the 13th
recognition by using three dimensional computer vision methods. In: IEEE ACM SIGKDD International Conference on Knowledge Discovery and
INTERNATIONAL CONFERENCE ON SYSTEMS MAN AND Data Mining, San Jose, California, USA, August 2007.
CYBERNETICS, vol 1. IEEE, pp 156–161 [37]W. Dai, G. Xue, Q. Yang, and Y. Yu, “Transferring naive bayes
[6] Huang XD, Ariki Y, Jack MA (1990) Hidden markov models for speech classifiers for text classification,” in Proceedings of the 22rd AAAI
recognition Conference on Artificial Intelligence, Vancouver, British Columbia,
[7] Lichtenauer JF, Hendriks EA, Reinders MJT (2008) Sign language Canada, July 2007, pp. 540–545.
recognition by combining statistical dtw and independent classification. [38] C. Lin, L. Li, W. Luo, K. C. P. Wang, and J. Guo, “Transfer Learning
IEEE Transactions on Pattern Analysis & Machine Intelligence Based Traffic Sign Recognition Using Inception-v3 Model”, Period.
30(11):2040–2046 Polytech. Transp. Eng., vol. 47, no. 3, pp. 242–250, 2019.
[8] [Link] [39] S. He, "Research of a Sign Language Translation System Based on
[9] [Link] Deep Learning," 2019 International Conference on Artificial Intelligence
[10][Link] and Advanced Manufacturing (AIAM), 2019, pp. 392-396, doi:
[11][Link] 10.1109/AIAM48774.2019.00083.
[12][Link] [40] Li Y., Zhang T., "Deep neural mapping support vector machines,"
[13]S. Albawi, T. A. Mohammed and S. Al-Zawi, "Understanding of a Neural Networks, Vol. 93, pp.185-194, 2017.
convolutional neural network," 2017 International Conference on
Engineering and Technology (ICET), 2017, pp. 1-6, doi:
10.1109/ICEngTechnol.2017.8308186.
[14]O'Shea, Keiron & Nash, Ryan. (2015) An Introduction to
Convolutional Neural Networks. ArXiv e-prints.
[15][Link]
v3-for-image-classification-86700411251b
[16][Link]
[17][Link]
itmconf_icacc2021_03004.pdf
[18] A novel approach for ISL alphabet recognition using Extreme
Learning Machine Anand Kumar1 Ravinder Kumar2 Received: 5
December 2019 / Accepted: 26 September 2020 Bharati Vidyapeeth’s
Institute of Computer Applications and Management 2020
[19][Link]
[20] Emami, Shervin & Suciu, Valentin. (2012). Facial Recognition using
OpenCV. Journal of Mobile, Embedded and Distributed Systems.
[21] Medjahed, Seyyid Ahmed (2015) A Comparative Study of Feature
Extraction Methods in Images Classification. International Journal of
Image, Graphics and Signal Processing.
[22] Szegedy, Christian & Vanhoucke, Vincent & Ioffe, Sergey & Shlens,
Jon & Wojna, ZB. (2016). Rethinking the Inception Architecture for
Computer Vision. 10.1109/CVPR.2016.308.
[23] Nwakanma, Ifeanyi & Oluigbo, Ikenna & Izunna, Okpala. (2014). Text
– To – Speech Synthesis (TTS). 2. 154-163.
[24] Lemmetty, S., 1999. Review of Speech Syn1thesis Technology.
Masters Dissertation, Helsinki University Of Technology.
[25] Kumar, Gaurav & Bhatia, Pradeep. (2014). A Detailed Review of
Feature Extraction in Image Processing Systems. 10.1109/ACCT.2014.74.
[26] Teja Kattenborn, Jens Leitloff, Felix Schiefer, Stefan Hinz,Review on
Convolutional Neural Networks (CNN) in vegetation remote
sensing,ISPRS Journal of Photogrammetry and Remote Sensing,Volume
173,2021,Pages 24-49,ISSN
0924-2716,[Link]
[27] F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in
Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi:
10.1109/JPROC.2020.3004555.
[28] Hussain, Mahbub & Bird, Jordan & Faria, Diego. (2018). A Study on
CNN Transfer Learning for Image Classification.
[29] Waibel, A., Hanazawa, T., Hinton, G. E., Shikano, K. & Lang, K.
Phoneme recognition using time-delay neural networks. IEEE Trans.
Acoustics Speech Signal Process. 37, 328–339 (1989).
[30] Bottou, L., Fogelman-Soulié, F., Blanchet, P. & Lienard, J.
Experiments with time delay networks and dynamic time warping for
speaker independent isolated digit recognition. In Proc. EuroSpeech 89
537–540 (1989).
[31] Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by
jointly learning to align and translate. In Proc. International Conference on
Learning Representations [Link] (2015).
[32] Xu, K. et al. Show, attend and tell: Neural image caption generation
with visual attention. In Proc. International Conference on Learning
Representations http:// [Link]/abs/1502.03044 (2015).
[33] G. P. C. Fung, J. X. Yu, H. Lu, and P. S. Yu, “Text classification
without negative examples revisit,” IEEE Transactions on Knowledge and
Data Engineering, vol. 18, no. 1, pp. 6–20, 2006.
[34] H. Al-Mubaid and S. A. Umair, “A new text categorization technique
using distributional clustering and learning logic,” IEEE Transactions on
Knowledge and Data Engineering, vol. 18, no. 9, pp. 1156–1165, 2006.