0% found this document useful (0 votes)
36 views40 pages

Screenshot 2024-05-02 at 8.58.33 PM

The document discusses the significance and methodologies of sign language detection, emphasizing its role in enhancing communication and inclusivity for deaf and hard of hearing individuals. It outlines the technological framework for developing detection systems, including data collection, model training, and real-time implementation, while also addressing existing models and their limitations. The document highlights the societal impact of this technology, including cultural appreciation and the need for ethical considerations in its deployment.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views40 pages

Screenshot 2024-05-02 at 8.58.33 PM

The document discusses the significance and methodologies of sign language detection, emphasizing its role in enhancing communication and inclusivity for deaf and hard of hearing individuals. It outlines the technological framework for developing detection systems, including data collection, model training, and real-time implementation, while also addressing existing models and their limitations. The document highlights the societal impact of this technology, including cultural appreciation and the need for ethical considerations in its deployment.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CHAPTER – 1

INTRODUCTION

1.1 OVERVIEW OF SIGN LANGUAGE DETECTION :

Sign language detection is a multifaceted field that requires interdisciplinary


collaboration between computer scientists, linguists, and experts in deaf culture.
Beyond its technical intricacies, it involves understanding the linguistic nuances and
cultural context of sign language to ensure accurate interpretation and translation.
Moreover, the development of sign language detection systems necessitates extensive
datasets of sign language gestures from diverse signers and backgrounds to ensure
robustness and inclusivity.
Furthermore, the impact of sign language detection extends beyond individual
interactions to broader societal implications. By breaking down communication
barriers, this technology promotes greater social inclusion and equal opportunities for
deaf and hard of hearing individuals in education, employment, and social interactions.
Moreover, it fosters cultural appreciation and understanding, encouraging a more
inclusive and accessible society for all.

Fig 1.1: Sign Language Hand Gestures


As research in sign language detection progresses, there are also ethical
considerations to address, such as privacy concerns related to video recording and
data storage, as well as ensuring equitable access to these technologies for all
members of the deaf community, including those from marginalized backgrounds.

1
1.2 MOTIVATION AND SIGNIFICANCE:

The motivation behind pursuing a project in sign language detection stems from
the profound impact it can have on improving the lives of deaf and hard of hearing
individuals. At its core, this project is driven by the desire to break down communication
barriers and foster greater inclusivity and accessibility for this community. By
developing technology that can accurately interpret and translate sign language
gestures into text or speech, we aim to empower individuals with hearing impairments
to communicate more effectively in various settings, including education, employment,
healthcare, and social interactions.
Additionally, This project holds significant societal importance as it endeavors
to empower deaf and hard of hearing individuals by creating technology that bridges
communication gaps through accurate sign language detection.

1.2.1 MOTIVATION:

Inclusivity and Accessibility:


To Break down communication barriers for deaf and hard of hearing individuals
and to enable more natural and intuitive interactions with technology and society and
Promote equal access to education, employment, healthcare, and social interactions
Recognition of Sign Language:
To Validate and elevate the status of sign languages as legitimate languages
and Acknowledge the linguistic and cultural significance of sign languages.
Advancements in technology:
To Leverage machine learning and computer vision technologies and to
Develop accurate, efficient, and user-friendly sign language detection systems and
also utilize the potential of technology to revolutionize communication and accessibility.

1.2.2 SIGNIFICANCE:

1.Empowerment Through Communication:


By accurately interpreting sign language gestures, it facilitates smoother and
more effective interaction not only with technology but also with the broader society.
This empowerment extends beyond mere communication, as it enables individuals to
express themselves authentically and participate fully in various aspects of life.

2
2.Global Impact and Collaboration:
The development and implementation of sign language detection technology
have the remarkable potential to catalyze international collaboration and cooperation
in addressing the needs of deaf and hard of hearing populations worldwide.
3.Positive Societal Impact:
The implementation of sign language detection technology promises to usher in
a myriad of positive societal changes. By creating a more inclusive and accessible
world, it ensures that individuals with hearing impairments have equal opportunities to
engage fully in society. Moreover, this technology empowers deaf and hard of hearing
individuals by providing them with the tools they need to communicate effectively and
participate actively in various aspects of life.
4.Cultural Preservation and Appreciation:
In the context of sign language detection, there exists a profound opportunity for
the preservation and promotion of the rich linguistic and cultural heritage embodied in
sign languages. By accurately recognizing and interpreting sign language gestures,
this technology plays a pivotal role in safeguarding and celebrating the unique forms
of expression within deaf communities worldwide.

1.3. TECHNOLOGICAL FRAMEWORK:


Within the technical framework of the sign language detection project, several
key components and methodologies are essential for its successful implementation:
Data Collection and Annotation:
 Gather a diverse dataset of sign language gestures performed by individuals
from various backgrounds and with different signing styles.
 Annotate the dataset with corresponding sign language labels to facilitate
supervised learning.
Preprocessing and Feature Extraction:

 Preprocess the collected data to remove noise, standardize hand positions, and
normalize lighting conditions.

 Extract relevant features from the preprocessed data, such as hand trajectories,
hand shapes, and motion dynamics, using techniques like image processing.

3
Model Selection and Training:

 Explore various machine learning models suited for sign language recognition
tasks, such as convolutional neural networks (CNNs) and recurrent neural
networks (RNNs).

 Train the selected models using the annotated dataset to learn the complex
patterns and relationships present in sign language gestures.

Real-Time Implementation:

 Implement the trained models in real-time systems to enable live sign language
detection.

 Optimize the performance of the deployed models for efficiency and speed,
considering resource constraints of the target platforms.

User Interface and Accessibility:

 Design a user-friendly interface for interacting with the sign language detection
system, considering the needs and preferences of deaf and hard of hearing
users.

 Ensure accessibility features, such as keyboard shortcuts and text-to-speech


functionalities, to accommodate users with varying abilities.

By integrating these components within the technical framework, the sign language
detection project aims to develop an accurate, efficient, and user-friendly system that
enhances accessibility and inclusion for deaf and hard of hearing individuals.

4
CHAPTER 2
LITERATURE REVIEW

2.1 EXISTING MODELS:

2.1.1 Overview of Existing Models:

Gesture recognition is an important topic in computer vision because of its wide


range of applications, such as HCI, sign language interpretation, and visual
surveillance. Krueger was the first who proposed Gesture recognition as a new form of
interaction between human and computer in the mid-seventies. The author designed
an interactive environment called computer-controlled responsive environment, a
space within which everything the user saw or heard was in response to what he/she
did. Rather than sitting down and moving only the users fingers, he/she interacted with
his/her body. In one of his applications, the projection screen becomes the wind-shield
of a vehicle the participant uses to navigate a graphic world. By standing in front of the
screen and holding out the users hands and leaning in the direction in which he/she
want to go, the user can fly through a graphic landscape. However, this research
cannot be considered strictly as a hand gesture recognition system since the potential
user does not only use the hand to interact with the system but also his/her body and
fingers, we choose to cite this due to its importance and impact in the field of gesture
recognition system for interaction purposes. Gesture recognition has been adapted for
various other research applications from facial gestures to complete bodily human
action. Thus, several applications have emerged and created a stronger need for this
type of recognition system. In their study, Dong described an approach of vision-based
gesture recognition for human–vehicle interaction. The models of hand gestures were
built by considering gesture differentiation and human tendency, and human skin
colors were used for hand segmentation. A hand tracking mechanism was suggested
to locate the hand based on rotation and zooming models. The method of hand-forearm
separation was able to improve the quality of hand gesture recognition. The gesture
recognition was implemented by template matching of multiple features.

5
2.1.2 Real-time sign language fingerspelling recognition using CNN:
This works focuses on static fingerspelling in American Sign Language method
for implementing a sign language to text/voice conversion system without using
handheld gloves and sensors, by capturing the gesture continuously and converting
them to voice. In this method, only a few images were captured for recognition. The
design of a communication aid for the physically challenged.

2.1.3 Design of a communication aid for physically challenged:


The system was developed under the MATLAB environment. It consists of
mainly two phases via training phase and the testing phase. In the training phase, the
author used feed-forward neural networks. The problem here is MATLAB is not that
efficient and also integrating the concurrent attributes as a whole is difficult .

2.1.4 Implementation using CNN:


Convolution Neural Networks (CNN) Computer Vision is a field of Artificial
Intelligence that focuses on problems related to images and videos. CNN combined
with Computer vision is capable of performing complex problems.
The Convolution Neural Networks has two main phases namely feature
extraction and classification. A series of convolution and pooling operations are
performed to extract the features of the image. The size of the output matrix decreases
as we keep on applying the filters. Size of new matrix = (Size of old matrix — filter
size)+1 A fully connected layer in the convolution neural networks will serve as a
classifier. In the last layer, the probability of the class will be predicted. The main steps
involved in convolution neural networks are:
1. Convolution
2. Pooling
3. Flatten
4. Full connection

6
Fig 2.1: Working of CNN
2.1.5 Results :
The results of sign language detection using a Convolutional Neural Network
(CNN) model showcase impressive accuracy and real-time performance, marking a
significant milestone in the field. Through meticulous training and optimization, the
CNN model exhibits remarkable proficiency in accurately classifying a wide range of
sign language gestures from live video streams or images. Moreover, the real-time
deployment of the CNN model enables seamless interaction with users, providing
instantaneous interpretation and facilitating natural communication in real-world
scenarios. This real-time capability is essential for applications requiring prompt
response times, such as communication aids and assistive technologies.
Continual refinement and iterative improvement further enhance the CNN
model's performance, ensuring adaptability to variations in sign language gestures and
evolving user needs.

Fig 2.2: Result with CNN Model

7
2.2 DRAWBACKS OF EXISTING MODEL

While sign language detection using a Convolutional Neural Network (CNN)


model offers significant advantages, it also has some drawbacks that warrant
consideration:

Limited Generalization: CNN models may struggle to generalize well to unseen sign
language gestures or variations in hand movements, especially those not adequately
represented in the training data. This limitation can result in reduced accuracy and
reliability in real-world applications.

Data Dependency: CNN models require large and diverse datasets for effective
training. Acquiring and annotating such datasets, particularly for sign language
gestures, can be time-consuming, labor-intensive, and resource-intensive. Limited or
biased training data may lead to model biases and inaccuracies.

Overfitting: CNN models are susceptible to overfitting, where the model learns to
memorize the training data instead of capturing underlying patterns. This can occur
when the model becomes too complex relative to the size and diversity of the training
data, leading to poor generalization performance on unseen data.

Complexity and Resource Requirements: CNN models, especially deep


architectures, are computationally intensive and may require substantial computational
resources for training and inference. Deploying CNN models in real-time applications
may pose challenges in terms of processing speed and memory requirements,
particularly on resource-constrained devices.

Interpretability: The inherent complexity of CNN models may hinder interpretability,


making it challenging to understand how the model arrives at its predictions. Lack of
interpretability may pose difficulties in diagnosing model errors, debugging, and gaining
insights into the underlying features driving classification decisions.

Adaptability to Variability: CNN models may struggle to adapt to variability in sign


language gestures, such as changes in lighting conditions, background clutter, or
variations in hand shapes and movements. Robustness to such variability is crucial for
reliable performance across diverse environments and user scenarios.

8
Addressing these drawbacks requires careful consideration of model
architecture, data quality, training strategies, and deployment constraints. Additionally,
exploring complementary approaches, such as multimodal fusion or transfer learning,
may help mitigate some of these limitations and enhance the robustness and
effectiveness of sign language detection systems. By leveraging insights from both
technical and user-focused perspectives, future advancements in sign language
detection technology can strive to mitigate these drawbacks, ultimately paving the way
for more reliable, inclusive, and empowering solutions for individuals with hearing
impairments. Additionally, fostering greater transparency, inclusivity, and co-design
practices in the development and deployment of sign language detection systems can
help ensure that they meet the diverse needs and preferences of users, while also
promoting equity and accessibility in technology adoption.

9
CHAPTER – 3

METHODOLOGY AND IMPLEMENTATION

3.1 METHODOLOGIES:
3.1.1 Machine Learning:
Machine learning (ML) is a field of study in artificial intelligence concerned with
the development and study of statistical algorithms that can learn from data and
generalize to unseen data, and thus perform tasks without explicit instructions.
Recently, artificial neural networks have been able to surpass many previous
approaches in performance. Machine learning approaches have been applied to many
fields including natural language processing, computer vision, speech recognition,
email filtering, agriculture, and medicine.ML is known in its application across business
problems under the name predictive analytics. Although not all machine learning is
statistically based, computational statistics is an important source of the field's
methods.
The mathematical foundations of ML are provided by mathematical optimization
(mathematical programming) methods. Data mining is a related (parallel) field of study,
focusing on exploratory data analysis (EDA) through unsupervised learning. Modern-
day machine learning has two objectives. One is to classify data based on models
which have been developed; the other purpose is to make predictions for future
outcomes based on these models. A hypothetical algorithm specific to classifying data
may use computer vision of moles coupled with supervised learning in order to train it
to classify the cancerous moles. A machine learning algorithm for stock trading may
inform the trader of future potential predictions. Machine learning grew out of the quest
for artificial intelligence (AI). In the early days of AI as an academic discipline, some
researchers were interested in having machines learn from data. They attempted to
approach the problem with various symbolic methods, as well as what were then
termed "neural networks"; these were mostly perceptions and other models that were
later found to be reinventions of the generalized linear models of statistics. Probabilistic
reasoning was also employed, especially in automated medical diagnosis.

10
Fig 3.1:Techniques

3.1.2 SUPERVISED LEARNING:


Supervised learning (SL) is a paradigm in machine learning where input objects
(for example, a vector of predictor variables) and a desired output value (also known
as human-labeled supervisory signal) train a model. The training data is processed,
building a function that maps new data on expected output values. An optimal scenario
will allow for the algorithm to correctly determine output values for unseen instances.
This requires the learning algorithm to generalize from the training data to unseen
situations in a "reasonable" way (see inductive bias). This statistical quality of an
algorithm is measured through the so-called generalization error.
Tendency for a task to employ supervised vs. unsupervised methods. Task
names straddling circle boundaries is intentional. It shows that the classical division of
imaginative tasks (left) employing unsupervised methods is blurred in today's learning
schemes.

Fig 3.2: Supervised/Unsupervised

11
3.1.3 CLASSIFICATION:
Classification is defined as the process of recognition, understanding, and
grouping of objects and ideas into preset categories a.k.a “sub-populations.” With the
help of these pre-categorized training datasets, classification in machine learning
programs leverage a wide range of algorithms to classify future datasets into respective
and relevant categories.
Classification algorithms used in machine learning utilize input training data for
the purpose of predicting the likelihood or probability that the data that follows will fall
into one of the predetermined categories. One of the most common applications of
classification is for filtering emails into “spam” or “non-spam”, as used by today’s top
email service providers

3.1.4 MULTI CLASS CLASSIFICATION:

The multi-class classification does not have the idea of normal and abnormal
outcomes, in contrast to binary classification. Instead, instances are grouped into one
of several well-known classes. In some cases, the number of class labels could be
rather high. In a facial recognition system, for instance, a model might predict that a
shot belongs to one of thousands or tens of thousands of faces.

Multiclass classification is a classification task with more than two classes. Each
sample can only be labelled as one class. For example, classification using features
extracted from a set of images of fruit, where each image may either be of an orange,
an apple, or a pear. Each image is one sample and is labelled as one of the 3 possible
classes. Multiclass classification makes the assumption that each sample is assigned
to one and only one label - one sample cannot, for example, be both a pear and an
apple. While all scikit-learn classifiers are capable of multiclass classification, the meta-
estimators offered by sklearn. multiclass permit changing the way they handle more
than two classes because this may have an effect on classifier performance (either in
terms of generalization error or required computational resources)

12
Fig 3.3: Multi Class Classification

3.1.5 DECISION TREES:


A decision tree is a decision support hierarchical model that uses a tree-like
model of decisions and their possible consequences, including chance event
outcomes, resource costs, and utility. It is one way to display an algorithm that only
contains conditional control statements. Decision trees are commonly used in
operations research, specifically in decision analysis to help identify a strategy most
likely to reach a goal, but are also a popular tool in machine learning. A decision tree
is a flowchart-like structure in which each internal node represents a "test" on an
attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the
outcome of the test, and each leaf node represents a class label (decision taken after
computing all attributes). The paths from root to leaf represent classification rules. In
decision analysis, a decision tree and the closely related influence diagram are used
as a visual and analytical decision support tool, where the expected values (or
expected utility) of competing alternatives are calculated. A decision tree consists of
three types of nodes: Decision nodes – typically represented by squares Chance nodes
typically represented by circles End nodes – typically represented by triangles Decision
trees are commonly used in operations research and operations management. If, in
practice, decisions have to be taken online with no recall under incomplete knowledge,
a decision tree should be paralleled by a probability model as a best choice model or
online selection model algorithm.[citation needed] Another use of decision trees is as
a descriptive or calculating conditional probabilities. Decision trees, influence
diagrams, utility functions, and other decision analysis tools and methods are taught to
undergraduate students in schools of business, health economics, and public health,
and are examples of operations research or management science methods.

13
Drawn from left to right, a decision tree has only burst nodes (splitting paths) but
no sink nodes (converging paths). So used manually they can grow very big and are
then often hard to draw fully by hand. Traditionally, decision trees have been created
manually – as the aside example shows – although increasingly, specialized software
is employed.

Fig 3.4: Decision Tree


Decision rules:
The decision tree can be linearized into decision rules, where the outcome is
the contents of the leaf node, and the conditions along the path form a conjunction in
the if clause. In general, the rules have the form:
if condition1 and condition2 and condition3 then outcome.
Decision rules can be generated by constructing association rules with the target
variable on the right. They can also denote temporal or causal relations.
Optimizing a decision tree:
A few things should be considered when improving the accuracy of the decision
tree classifier. The following are some possible optimizations to consider when looking
to make sure the decision tree model produced makes the correct decision or
classification. Note that these things are not the only things to consider but only some.
Increasing the number of levels of the tree:
The accuracy of the decision tree can change based on the depth of the decision
tree. In many cases, the tree’s leaves are pure nodes. When a node is pure, it means
that all the data in that node belongs to a single class. For example, if the classes in
the data set are Cancer and Non-Cancer a leaf node would be considered pure when
all the sample data in a leaf node is part of only one class, either cancer or non-cancer.
It is important to note that a deeper tree is not always better when optimizing the
decision tree. A deeper tree can influence the runtime in a negative way. If a certain
classification algorithm is being used, then a deeper tree could mean the runtime of
this classification algorithm is significantly slower. There is also the possibility that the
actual algorithm building the decision tree will get significantly slower as the tree gets

14
deeper. If the tree-building algorithm being used splits pure nodes, then a decrease in
the overall accuracy of the tree classifier could be experienced. Occasionally, going
deeper in the tree can cause an accuracy decrease in general, so it is very important
to test modifying the depth of the decision tree and selecting the depth that produces
the best results. To summarize, observe the points below, we will define the number D
as the depth of the tree.
Possible advantages of increasing the number D are Accuracy of the decision-
tree classification model increases. Possible disadvantages of increasing D is Runtime
issues also decrease in accuracy.
The ability to test the differences in classification results when changing D is
imperative. We must be able to easily change and test the variables that could affect
the accuracy and reliability of the decision tree-model.

The choice of node-splitting functions:


The node splitting function used can have an impact on improving the accuracy
of the decision tree. For example, using the information-gain function may yield better
results than using the phi function. The phi function is known as a measure of
“goodness” of a candidate split at a node in the decision tree. The information gain
function is known as a measure of the “reduction in entropy”. In the following, we will
build two decision trees. One decision tree will be built using the phi function to split
the nodes and one decision tree will be built using the information gain function to split
the nodes.

The main advantages and disadvantages of information gain and phi function:
One major drawback of information gain is that the feature that is chosen as the next
node in the tree tends to have more unique values.
An advantage of information gain is that it tends to choose the most impactful features
that are close to the root of the tree. It is a very good measure for deciding the relevance
of some features. The phi function is also a good measure for deciding the relevance
of some features based on "goodness". This is the information gain function formula.
The formula states the information gain is a function of the entropy of a node of the
decision tree minus the entropy of a candidate split at node t of a decision tree.
Igains(s)=H(t)-H(s,t)}

15
This is the phi function formula. The phi function is maximized when the chosen
feature splits the samples in a way that produces homogenous splits and have around
the same number of samples in each split.
Phi (s,t)=(2*P_{L}*P_{R})*Q(s|t)}
We will set D, which is the depth of the decision tree we are building, to three
(D = 3). We also have the following data set of cancer and non-cancer samples and
the mutation features that the samples either have or do not have. If a sample has a
feature mutation then the sample is positive for that mutation, and it will be represented
by one. If a sample does not have a feature mutation then the sample is negative for
that mutation, and it will be represented by zero. To summarize, C stands for cancer
and NC stands for non-cancer. The letter M stands for mutation, and if a sample has a
particular mutation it will show up in the table as a one and otherwise zero.

Table 3.1: Phi and Gain Function


The sample data
M1 M2 M3 M4 M5
C1 0 1 0 1 1
NC1 0 0 0 0 0
NC2 0 0 1 1 0
NC3 0 0 0 0 0
C2 1 1 1 1 1
NC4 0 0 0 1 0

Now, we can use the formulas to calculate the phi function values and
information gain values for each M in the dataset. Once all the values are calculated
the tree can be produced. The first thing to be done is to select the root node. In
information gain and the phi function we consider the optimal split to be the mutation
that produces the highest value for information gain or the phi function. Now assume
that M1 has the highest phi function value and M4 has the highest information gain
value. The M1 mutation will be the root of our phi function tree and M4 will be the root
of our information gain tree.
The left node is the root node of the tree we are building using the phi function to split
the nodes. The right node is the root node of the tree we are building using information
gain to split the nodes.

16
Now, once we have chosen the root node we can split the samples into two
groups based on whether a sample is positive or negative for the root node mutation.
The groups will be called group A and group B. For example, if we use M1 to split the
samples in the root node we get NC2 and C2 samples in group A and the rest of the
samples NC4, NC3, NC1, C1 in group B.
Disregarding the mutation chosen for the root node, proceed to place the next
best features that have the highest values for information gain or the phi function in the
left or right child nodes of the decision tree. Once we choose the root node and the two
child nodes for the tree of depth = 3 we can just add the leaves. The leaves will
represent the final classification decision the model has produced based on the
mutations a sample either has or does not have. The left tree is the decision tree we
obtain from using information gain to split the nodes and the right tree is what we obtain
from using the phi function to split the nodes.The resulting tree from using information
gain to split the nodes.
Now assume the classification results from both trees are given using a confusion
matrix.
Table 3.2: Gain Confusion Matrix
Predicted: C Predicted:NC
Actual:C 1 1
Actual: NC 0 4

Table 3.3: Phi Function Confusion Matrix


Predicted: C Predicted:NC
Actual:C 1 1
Actual: NC 0 6

The tree using information gain has the same results when using the phi function
when calculating the accuracy. When we classify the samples based on the model
using information gain we get one true positive, one false positive, zero false negatives,
and four true negatives. For the model using the phi function we get two true positives,
zero false positives, one false negative, and three true negatives. The next step is to
evaluate the effectiveness of the decision tree using some key metrics that will be
discussed in the evaluating a decision tree section below. The metrics that will be
discussed below can help determine the next steps to be taken when optimizing the
decision tree.

17
3.1.6. RANDOM FOREST:
Random forests or random decision forests is an ensemble learning method for
classification, regression and other tasks that operates by constructing a multitude of
decision trees at training time. For classification tasks, the output of the random forest
is the class selected by most trees. For regression tasks, the mean or average
prediction of the individual trees is returned. Random decision forests correct for
decision trees' habit of overfitting to their training set.
the basis of the modern practice of random forests, in particular:
 Using out-of-bag error as an estimate of the generalization error.
 Measuring variable importance through permutation.
The report also offers the first theoretical result for random forests in the form of a
bound on the generalization error which depends on the strength of the trees in the
forest and their correlation. Decision trees are a popular method for various machine
learning tasks. Tree learning "come[s] closest to meeting the requirements for serving
as an off-the-shelf procedure for data mining", "because it is invariant under scaling
and various other transformations of feature values, is robust to inclusion of irrelevant
features, and produces inspectable models. However, they are seldom accurate".

Training Training Training


Training Date 1 Date 2 Date n
Set

Decision Decision Decision


Tree 1 Tree 1 Tree 1

Test set Voting

Prediction

Fig 3.5: Random Forest Working


In particular, trees that are grown very deep tend to learn highly irregular
patterns: they overfit their training sets, i.e. have low bias, but very high variance.

18
Random forests are a way of averaging multiple deep decision trees, trained on
different parts of the same training set, with the goal of reducing the variance. This
comes at the expense of a small increase in the bias and some loss of interpretability,
but generally greatly boosts the performance in the final model.The training algorithm
for random forests applies the general technique of bootstrap aggregating, or bagging,
to tree learners. Given a training set X = x1, ..., xn with responses Y = y1, ..., yn, bagging
repeatedly (B times) selects a random sample with replacement of the training set and
fits trees to these samples:
For b = 1, ..., B:
1. Sample, with replacement, n training examples from X, Y; call these Xb, Yb.
2. Train a classification or regression tree fb on Xb, Yb.
After training, predictions for unseen samples x' can be made by averaging the
predictions from all the individual regression trees on x or by taking the plurality vote in
the case of classification trees. This bootstrapping procedure leads to better model
performance because it decreases the variance of the model, without increasing the
bias. This means that while the predictions of a single tree are highly sensitive to noise
in its training set, the average of many trees is not, as long as the trees are not
correlated. Simply training many trees on a single training set would give strongly
correlated trees (or even the same tree many times, if the training algorithm is
deterministic); bootstrap sampling is a way of de-correlating the trees by showing them
different training sets.
Additionally, an estimate of the uncertainty of the prediction can be made as the
standard deviation of the predictions from all the individual regression trees on x
The number of samples/trees, B, is a free parameter. Typically, a few hundred to
several thousand trees are used, depending on the size and nature of the training set.
An optimal number of trees B can be found using cross-validation, or by observing the
out-of-bag error: the mean prediction error on each training sample xi, using only the
trees that did not have xi in their bootstrap sample. The training and test error tend to
level off after some number of trees have been fit.

19
Fig 3.6: Bagging Algorithm

The above procedure describes the original bagging algorithm for trees.
Random forests also include another type of bagging scheme: they use a modified tree
learning algorithm that selects, at each candidate split in the learning process, a
random subset of the features. This process is sometimes called "feature bagging".
The reason for doing this is the correlation of the trees in an ordinary bootstrap sample:
if one or a few features are very strong predictors for the response variable (target
output), these features will be selected in many of the B trees, causing them to become
correlated. An analysis of how bagging and random subspace projection contribute to
accuracy gains under different conditions is given by Ho.
Typically, for a classification problem with p features, √p (rounded down)
features are used in each split. For regression problems the inventors recommend p/3
(rounded down) with a minimum node size of 5 as the default. In practice, the best
values for these parameters should be tuned on a case-to-case basis for every
problem.

20
RANDOM FOREST ALGORITHM STRUCTURE:
Training phase Detection phase

Training samples
Image from video

Sample
description Skin detect

Detection windows
Train classifier

Classified by
classifier

Position,Gesture

Fig 3.7: Random Forest Algorithm

3.2 DESIGN and IMPLEMENTATION:

1. Design
Gesture recognition provides real-time data to a computer to make it fulfill the
user’s commands. Motion sensors in a device can track and interpret gestures, using
them as the primary source of data input. A majority of gesture recognition solutions
feature a combination of 3D depth-sensing cameras and infrared cameras together
with machine learning systems.
Gesture recognition consists of three basic levels:
i) Detection. With the help of a camera, a device detects hand or body movements, and a
machine learning algorithm segments the image to find hand edges and positions.

ii)Tracking. A device monitors movements frame by frame to capture every movement


and provide accurate input for data analysis.

iii)Recognition. The system tries to find patterns based on the gathered data. When
the system finds a match and interprets a gesture, it performs the action associated

21
with this gesture. Feature extraction and classification in the scheme below implements
the recognition functionality.

Hand Feature Classification


Hand tracking extraction
segmentation

Image Frame
Output gesture
acquisition

Fig 3.8: Block diagram

Google announced a new approach to hand perception implemented in MediaPipe a


cross-platform framework for building multimodal machine learning pipelines.

Palm detector

Hand Gesture
Frame 1 landmarks recognizer

Frame 2 Hand Gesture


landmarks recognizer

Hand Gesture
Frame 3
landmarks recognizer

Fig 3.13: Workflow

IMPLEMENTATION:

The Whole project will happen in four different steps:


1)Input Data Collection
2)Creating Datasets
3) Training Model
4) Inference-Output

22
3.2.1 INPUT DATA COLLECTION:
In the data collection phase of the sign language detection project, the process
begins with identifying the target sign language gestures, considering factors such as
cultural relevance and practical application scenarios. Various data sources are
explored to gather a comprehensive dataset, including publicly available repositories,
crowdsourcing platforms, or proprietary recording methods. Depending on the
availability of existing datasets and the specific requirements of the project, a
combination of these sources may be utilized to ensure dataset diversity and
adequacy. Once the data is acquired, it undergoes rigorous annotation to assign
accurate labels to each sample, often involving collaboration with sign language
experts or native speakers to ensure linguistic and cultural authenticity. Augmentation
techniques are then applied to enrich the dataset with variations in lighting conditions,
backgrounds, and hand orientations, enhancing the model's robustness to real-world
scenarios. Preprocessing steps such as resizing, cropping, and color normalization are
performed to standardize the data format and optimize computational efficiency. The
dataset is subsequently divided into training, validation, and test sets using stratified
sampling to maintain gesture distribution balance across subsets. Quality control
measures, including data cleaning and outlier detection, are employed to identify and
rectify errors or inconsistencies in the dataset, thereby ensuring the integrity and
reliability of the training process. Throughout the data collection phase, stringent
adherence to privacy regulations and ethical guidelines is prioritized to safeguard the
rights and confidentiality of participants involved in the dataset creation process. This
holistic approach to data collection establishes a solid foundation for subsequent model
development and evaluation, ultimately contributing to the successful deployment of
an accurate and inclusive sign language detection system.
The code begins by initializing a directory named Data_Dir, designated for
storing the captured images, and sets up a video capture object cap to retrieve frames
from the default camera, identified by index 0. It proceeds to iterate over a specified
range of number_of_classes, creating subdirectories within Data_Dir, for each class if
they do not already exist. Within each class directory, the code captures a
predetermined number of frames, specified by dataset_size, utilizing the cap.read()
function, and saves them as JPEG images through cv2.imwrite(). Before capturing
each frame, the code overlays a message onto the frame, prompting the user to press
'Q' when ready to capture the image.

23
Upon completion of the data collection process, the video capture is released
(cap.release()), and all OpenCV windows are closed (cv2.destroyAllWindows()),
ensuring proper termination of the application. This streamlined approach ensures
efficient and organized image collection while providing clear instructions to the user
throughout the process.

3.2.2 CREATING DATASETS:


Following the initial input collection phase, the dataset creation process entails
a series of meticulously orchestrated steps aimed at transforming raw data into a
refined, high-quality resource suitable for training machine learning models. Beginning
with the collected data, whether in the form of images, videos, or other media, pre
processing techniques are applied to standardize and enhance its suitability for
analysis. This may involve tasks such as resizing images to a consistent resolution,
normalizing pixel values to a common scale, and augmenting the dataset to introduce
variability and robustness. Augmentation methods could include introducing simulated
noise, applying geometric transformations, or adjusting lighting conditions to mimic
real-world scenarios. Subsequently, the dataset is partitioned into distinct subsets,
typically comprising training, validation, and test sets, ensuring that each subset
contains a representative sample of the overall data distribution. Annotation, a crucial
step in dataset creation, involves labelling each data instance with relevant metadata
or ground truth information. This process often requires expertise from domain
specialists or linguists, particularly in the case of sign language datasets, where
accurate interpretation and labelling of gestures are paramount. Quality assurance
measures are then implemented to identify and mitigate any inconsistencies, errors, or
biases within the dataset, thereby ensuring its integrity and reliability for subsequent
model training and evaluation. Throughout these stages, strict adherence to ethical
guidelines and privacy regulations is maintained to safeguard the privacy and dignity
of individuals contributing to the dataset. By meticulously curating and refining the
dataset, researchers and practitioners can build robust machine learning models
capable of accurate and reliable performance in real-world applications, thus
advancing the field of sign language recognition and fostering greater inclusivity and
accessibility.

24
Utilizing the MediaPipe library (mp), the code employs hand landmark detection
techniques to analyze images. It initializes a Hands object configured for static image
mode, setting a minimum detection confidence threshold of 0.3 to ensure reliable
detection. The process involves iterating through the directories and images within the
designated DATA_DIR, where each image is accessed using OpenCV's (cv2.imread())
functionality and subsequently converted to the RGB format (cv2.cvtColor()). Upon
processing each image, the code utilizes the Hands object to extract hand landmarks,
accessible via results.multi_hand_landmarks. For each detected hand, it meticulously
extracts the x and y coordinates of individual landmarks, subsequently normalizing them
relative to the minimum x and y coordinates found within the image. This normalization
procedure involves subtracting the minimum x-coordinate (x - min(x_)) and y-coordinate (y
- min(y_)), facilitating consistent and standardized data representation across different
images and hand configurations. Through this intricate process, the code achieves precise
and structured extraction of hand landmarks, enabling subsequent analysis and
interpretation of hand gestures with enhanced accuracy and reliability.

3.2.3 TRAINING THE MODEL:


In the training phase, the integration of the Random Forest algorithm with the
MediaPipe library enriches the development process of the sign language detection
model, enabling a comprehensive approach to feature extraction and classification.
Leveraging MediaPipe's hand landmark detection capabilities, the preprocessing step
gains significant depth by extracting intricate hand features, such as landmark
positions and spatial relationships, from the input images or frames. This rich feature
set provides the Random Forest classifier with detailed information crucial for
accurately discerning various sign language gestures. Moreover, the flexibility of the
Random Forest algorithm allows for robust handling of complex feature interactions
and noise inherent in real-world data. By training on a diverse dataset meticulously
prepared with MediaPipe's outputs, the model becomes adept at recognizing subtle
nuances and variations in hand gestures, enhancing its overall performance and
adaptability. Throughout the training process, rigorous evaluation on validation sets
ensures that the model generalizes well to unseen data, fostering confidence in its real-
world applicability. Furthermore, the seamless integration of MediaPipe's hand
landmark detection functionality with the Random Forest model enables streamlined
deployment, empowering the system to deliver real-time sign language detection with
unparalleled accuracy and reliability.
25
This cohesive synergy between advanced feature extraction techniques and
powerful classification algorithms underscores the effectiveness of the combined
approach in developing state-of-the-art sign language detection systems poised to
make a meaningful impact in accessibility and communication domains.
The code begins by loading the hand landmark data and corresponding labels
from the pickle file named data.pickle using the pickle.load() function, converting them
into NumPy arrays to facilitate further processing. Next, the data is divided into training
and testing sets using the train_test_split() function from sklearn.model_selection, with
20% of the data reserved for testing purposes and the remaining portion allocated for
training. Subsequently, a RandomForestClassifier model is instantiated from
sklearn.ensemble, initialized with default parameters, and trained on the training data
(x_train and y_train) using the model.fit() method. Following training, the trained model
is employed to predict the labels for the test data (x_test) utilizing the model.predict()
function, enabling the calculation of prediction accuracy via accuracy_score() from
sklearn.metrics. The resulting accuracy score, representing the percentage of correctly
classified samples, is then printed for evaluation purposes. Additionally, to facilitate
future usage, the trained model is serialized into a pickle file named model.p, ensuring
its preservation and accessibility for subsequent applications or analyses. This
meticulous procedure ensures a systematic and comprehensive approach to model
training and evaluation, culminating in the development of a robust and reliable sign
language detection system.

3.2.4 INFERENCE OUTPUT:


After training the model using a combination of Random Forest algorithm,
decision trees, and the MediaPipe library, the inference output stage begins with
loading the trained model into memory, leveraging appropriate functions or libraries.
Following this, the input data, typically consisting of images or video frames, undergoes
preprocessing with the MediaPipe library to extract hand landmarks and relevant
features. These features are then used to prepare the input data in a format suitable
for the trained model, ensuring alignment with its expected input requirements.
Subsequently, the preprocessed data is fed into the ensemble of decision trees
comprising the Random Forest model, where each decision tree independently
processes the input features to make individual predictions.

26
The ensemble then aggregates these predictions to produce a final output,
which could include class labels, probabilities, or regression values, depending on the
nature of the task. Post-processing techniques may be applied to interpret or visualize
the model's predictions, such as mapping predicted class labels to their corresponding
sign language gestures or generating visualizations of decision boundaries. Finally, the
inference output is presented or utilized according to the specific requirements of the
application, enabling real-time sign language detection, gesture recognition, or other
related tasks. Throughout this process, careful consideration is given to ensuring the
accuracy, robustness, and efficiency of the model's predictions, with ongoing
monitoring and validation strategies employed to assess performance in diverse real-
world scenarios and identify opportunities for refinement and optimization.
The code integrates the functionalities of the MediaPipe library (mp) and a trained
Random Forest classifier model to facilitate real-time hand gesture recognition from a webcam
feed (cap). Leveraging MediaPipe's hand landmark detection capabilities, the code employs
mp_drawing.draw_landmarks() to overlay detected landmarks onto each frame, providing a
visual representation of hand gestures. Additionally, the trained Random Forest classifier
model, previously serialized into a pickle file (model.p), is loaded using pickle.load() to predict
hand gestures based on the extracted landmarks. For each frame processed, the code extracts
the x and y coordinates of each hand landmark, normalizes them relative to the minimum x
and y coordinates, and compiles them into a dataset (data_aux) for prediction. Subsequently,
the model predicts the hand gesture, and the predicted label is mapped to a corresponding
action using a dictionary (labels_dict). In cases where the predicted gesture corresponds to
actions such as 'TOILET', 'FOOD', or 'WATER', the code invokes the speak() function to play
a specific sound file indicative of the action, enhancing user interaction and accessibility.
Furthermore, the predicted gesture is displayed on the frame, providing users with visual
feedback. Through the seamless integration of MediaPipe, Random Forest algorithm, and
custom action mapping, the code facilitates real-time interpretation of hand gestures, enabling
interactive applications with audiovisual feedback for enhanced user experience and
accessibility.

27
CHAPTER 4
SOFTWARE ENVIRONMENT
4.1 Python :
Python is a high-level, interpreted, interactive and object-oriented scripting
language. Python is designed to be highly readable. It uses English keywords
frequently where as other languages use punctuation, and it has fewer syntactical
constructions than other languages.

Fig 4.1: Python

Python is Interpreted − Python is processed at runtime by the interpreter. You do not


need to compile your program before executing it. This is similar to PERL and PHP.

Python is Interactive − You can actually sit at a Python prompt and interact with the
interpreter directly to write your programs.

Python is Object-Oriented − Python supports Object-Oriented style or technique of


programming that encapsulates code within objects.

Python is a Beginner's Language − Python is a great language for the beginner-level


programmers and supports the development of a wide range of applications from
simple text processing to WWW browsers to games.

Interactive Mode − Python has support for an interactive mode which allows
interactive testing and debugging of snippets of code.

Portable − Python can run on a wide variety of hardware platforms and has the same
interface on all platforms.

28
4.2 MODULES USED
1.NUMPY:
In this project, NumPy plays a pivotal role in handling data arrays and facilitating
numerical computations essential for sign language detection using machine learning
algorithms. Leveraging NumPy's powerful array data structure, the project efficiently
represents and manipulates data, including hand landmarks extracted from the
MediaPipe library and feature vectors derived from image inputs. NumPy's
comprehensive suite of functions enables seamless data preprocessing tasks such as
normalization, scaling, and feature extraction, ensuring that the input data is
appropriately formatted and prepared for model training. Additionally, NumPy's
extensive support for array operations, including arithmetic operations, slicing, and
indexing, facilitates the implementation of complex algorithms for feature manipulation
and computation. Furthermore, NumPy's seamless integration with machine learning
libraries like scikit-learn enables effortless conversion of data arrays to compatible
formats for model training and evaluation, streamlining the development process.
Overall, NumPy's efficiency, versatility, and integration capabilities contribute
significantly to the success and effectiveness of the sign language detection project,
enabling robust and scalable implementation of machine learning algorithms for real-
world applications.

Fig 4.2: Numpy


2.MEDIA PIPE:
In this project, the MediaPipe library plays a crucial role in detecting hand
landmarks from webcam feeds in real-time, providing foundational data for sign
language detection. By leveraging MediaPipe's pre-trained hand landmark detection
models, the project extracts precise coordinates of key landmarks representing hand
gestures directly from the video stream. These landmarks serve as essential features
for training the machine learning model to recognize sign language gestures
accurately.

29
Furthermore, MediaPipe's integration with Python allows seamless integration
into the project's codebase, facilitating easy access to hand landmark data for further
processing and analysis. Overall, the use of the MediaPipe module empowers the
project with advanced hand tracking capabilities, laying the groundwork for effective
sign language detection systems that can be deployed in real-world applications to
enhance accessibility and communication for individuals with hearing impairments.

Fig 4.3: Media Pipe Library Tracking

3.OpenCV:
In this project, OpenCV (Open Source Computer Vision Library) serves as a
foundational component, providing essential capabilities for image processing, video
analysis, and visualization tasks critical for sign language detection. Leveraging
OpenCV's comprehensive suite of functions, the project accesses and processes video
frames captured from a webcam feed in real-time. OpenCV's video capture
functionality enables seamless retrieval of frames, ensuring a continuous stream of
input data for hand landmark detection and subsequent analysis. Additionally,
OpenCV's rich set of image processing functions, including resizing, color conversion,
and filtering, enables preprocessing of video frames to enhance their quality and
suitability for further analysis. Throughout the project pipeline, OpenCV facilitates the
visualization of video frames and annotated results, allowing for real-time feedback and
evaluation of the system's performance. Moreover, OpenCV seamlessly integrates
with machine learning libraries, enabling the deployment of machine learning models
for tasks such as gesture recognition. Its versatility, efficiency, and extensive feature
set make OpenCV an indispensable tool for various computer vision tasks within the
project, contributing to the development of robust and effective sign language detection
systems capable of enhancing accessibility and communication for individuals with
hearing impairments.

30
Fig 4.4: Hand Gesture by OpenCV

4.3 COMPILER USED


VSCODE:
In this project, Visual Studio Code (VS Code) serves as the central hub for
developing, debugging, and managing the codebase. Offering a user-friendly interface
and a plethora of features tailored to modern development workflows, VS Code
streamlines the development process for building a sign language detection system.
The versatile code editor provides essential functionalities such as syntax highlighting,
code completion, and intelligent suggestions, facilitating efficient coding and enhancing
readability. Integration with Git version control simplifies collaboration among team
members, enabling seamless tracking of changes, branching, and merging directly
within the IDE. Moreover, VS Code's extensive ecosystem of extensions enhances its
capabilities, allowing developers to customize their environment with tools and plugins
tailored to specific requirements. For this project, extensions for Python, machine
learning frameworks like TensorFlow, and computer vision libraries such as OpenCV
can be seamlessly integrated, providing additional functionalities such as debugging,
linting, and virtual environment management. The integrated terminal further enhances
productivity by enabling developers to execute commands, run scripts, and manage
project dependencies without leaving the IDE. With built-in support for task automation
and debugging tools, VS Code empowers developers to streamline their workflows,
troubleshoot issues, and iterate on their code effectively.

31
Overall, Visual Studio Code serves as a robust and efficient development
environment, enabling developers to create, test, and deploy the sign language
detection system with confidence and ease

Fig 4.5: Vscode

32
CHAPTER 5
RESULTS AND DISCUSSIONS

5.1. RESULTS:
In This Project we have added the voice to the emergency gestures like Water,
Washroom and Food. So when the gestures corresponding to these three are given
then there will be sound produced which spells corresponding Gestures.

OUTPUT:

Fig 5.1: Output for a Gesture


In this output scenario, the user intends to convey the message "BE STRONG"
through a specific hand gesture depicted above. Initially, the gesture serves as input
data, which is captured and processed by the system. Through a training phase, the
system learns to recognize and interpret this gesture, associating it with the
corresponding message "BE STRONG." This training process involves feeding the
input gesture along with its labeled interpretation into the machine learning model, such
as a Random Forest classifier or a neural network, enabling the model to learn the
underlying patterns and relationships between gestures and their intended meanings.
Once trained, the model becomes capable of accurately predicting the message "BE
STRONG" when presented with similar hand gestures in the future

33
OUTPUT (2):

Fig 5.2:Output for a Gesture


In this particular scenario the input gesture representing the urgent need for
water undergoes a comprehensive translation process, resulting in outputs in both
audio and text formats. The recognition system meticulously analyzes each aspect of
the gesture, capturing its nuances and intricacies to ensure an accurate translation. As
water is identified as an essential emergency requirement, special emphasis is placed
on conveying this message effectively. The translation process extends to encompass
every detail of the gesture, capturing its urgency and significance. Through advanced
technology and sophisticated algorithms, the project aims to bridge communication
barriers by providing accessible and precise translations for individuals reliant on sign
language. This elongated translation process underscores the project's commitment to
inclusivity, ensuring that crucial messages, such as the urgent need for water, are
conveyed accurately and comprehensively to facilitate effective communication for the
deaf and hard of hearing community.

34
CHAPTER 6
CONCLUSION AND FUTURE SCOPE

6.1CONCLUSION

The development of a sign language detection system using machine learning,


particularly employing the Random Forest algorithm alongside the MediaPipe library,
represents a significant step towards enhancing accessibility and communication for
individuals with hearing impairments. Through meticulous data collection,
preprocessing, and model training, the project has demonstrated the feasibility of
accurately interpreting and classifying sign language gestures in real-time. Leveraging
the robustness and interpretability of the Random Forest algorithm, coupled with
MediaPipe's advanced hand landmark detection capabilities, the system achieves high
levels of accuracy and reliability in recognizing diverse sign language gestures. By
bridging the communication gap between sign language users and non-users, this
project contributes to fostering inclusivity and understanding within society. Moving
forward, further research and development in this field hold the potential to refine and
expand the capabilities of sign language detection systems, paving the way for even
greater accessibility and integration for individuals with hearing impairments. As
technology continues to evolve, projects like this serve as invaluable tools in breaking
down barriers and promoting inclusivity in our communities.

6.2 FUTURE SCOPE


The project on sign language detection using machine learning with the Random
Forest algorithm and MediaPipe library lays a solid foundation for future advancements
and extensions in several areas.
Enhanced Accuracy: Continued research and development can focus on improving
the accuracy and robustness of the model. This may involve collecting larger and more
diverse datasets, refining preprocessing techniques, and exploring advanced feature
extraction methods to capture subtle nuances in hand gestures more effectively.

Real-Time Performance: Optimizing the system for real-time performance is essential


for practical applications. Future work could involve implementing parallel processing
techniques, optimizing algorithm parameters, and leveraging hardware acceleration
35
(e.g., GPUs) to reduce inference time and latency, enabling seamless interaction in
real-world scenarios.

Gesture Recognition Expansion: Expanding the system to recognize a broader


range of sign language gestures and gestures from different sign languages would
increase its utility and inclusivity. Collaborating with sign language experts and
communities to identify and incorporate additional gestures can enrich the system's
vocabulary and adaptability.

Multimodal Integration: Integrating additional sensor modalities, such as depth


sensors or wearable devices, can provide complementary information about hand
movements and gestures, enhancing the system's accuracy and robustness.
Combining visual data from cameras with depth or motion data can improve gesture
recognition performance in various lighting conditions and environments.

User Interface and Accessibility: Developing user-friendly interfaces and


applications tailored to the needs of sign language users can enhance accessibility and
usability. This could include designing intuitive gesture-based interfaces, incorporating
voice feedback or haptic feedback for users with visual impairments, and ensuring
compatibility with assistive technologies.

Deployment in Assistive Technologies: Integrating the sign language detection


system into assistive technologies, such as mobile apps, smart glasses, or
communication devices, can empower individuals with hearing impairments to
communicate more effectively in various settings. Collaborating with organizations and
communities to deploy the system in real-world environments can facilitate its adoption
and impact.

Education and Training: Utilizing the system as an educational tool for learning sign
language or training interpreters can promote language acquisition and proficiency
among learners. Developing interactive tutorials, games, or immersive learning
experiences based on the sign language detection system can make learning sign
language more engaging and accessible to a broader audience.

36
Overall, the future scope of this project encompasses a wide range of opportunities for
advancing sign language detection technology, promoting inclusivity, and empowering
individuals with hearing impairments to communicate more effectively in diverse
settings.

37
REFERENCES

[1] Ms. Greeshma Pala , Ms. Jagruti Bhagwan Jethwani , Mr. Satish Shivaji Kumbhar
, Ms. Shruti Dilip Patil in Proceedings of the International Conference on Artificial
Intelligence and Smart Systems (ICAIS-2021) IEEE Xplore Part Number:
CFP21OAB-ART
[2] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-based learning applied to
document recognition," in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324,
Nov. 1998, doi: 10.1109/5.726791.
[3] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov and L. Chen, "MobileNetV2: Inverted
Residuals and Linear Bottlenecks," 2018 IEEE/CVF Conference on Computer
Vision and Pattern Recognition, 2018, pp. 4510-4520, doi:
10.1109/CVPR.2018.00474.
[4] L. K. Hansen and P. Salamon, "Neural network ensembles," in IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 12, no. 10, pp. 993-1001, Oct.
1990, doi: 10.1109/34.58871.
[5] Kang, Byeongkeun, Subarna Tripathi, and Truong Q. Nguyen. ”Real- time sign
language fingerspelling recognition using convolutional neural networks from depth
map.” arXiv preprint arXiv: 1509.03001 (2015).
[6] Suganya, R., and T. Meeradevi. ”Design of a communication aid for phys- ically
challenged.” In Electronics and Communication Systems (ICECS), 2015 2nd
International Conference on, pp. 818-822. IEEE, 2015.
[7] Sruthi Upendran, Thamizharasi. A,” American Sign Language Interpreter System
for Deaf and Dumb Individuals”, 2014 International Conference on Control,
Instrumentation, Communication and Computa.
[8] David H. Wolpert, Stacked generalization, Neural Networks, Volume 5, Issue 2,
1992, Pages 241-259, ISSN 0893-6080, https://doi.org/10.1016/S0893-
6080(05)80023-1

38
39

You might also like