0% found this document useful (0 votes)
4 views

Sign Language Detection Using the Computer Vision

The project report details the development of a sign language detection system using computer vision techniques, specifically leveraging MediaPipe and Google Trainable Machine for real-time hand gesture recognition. The system aims to bridge communication gaps for the deaf and hard-of-hearing community by accurately translating sign language gestures into text or speech. The report outlines the challenges faced, objectives set, and the proposed methodologies to enhance the accuracy and efficiency of sign language interpretation.

Uploaded by

Jayant Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Sign Language Detection Using the Computer Vision

The project report details the development of a sign language detection system using computer vision techniques, specifically leveraging MediaPipe and Google Trainable Machine for real-time hand gesture recognition. The system aims to bridge communication gaps for the deaf and hard-of-hearing community by accurately translating sign language gestures into text or speech. The report outlines the challenges faced, objectives set, and the proposed methodologies to enhance the accuracy and efficiency of sign language interpretation.

Uploaded by

Jayant Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

A

Major Project Report


on

Sign Language Detection Using The Computer Vision

Submitted for the partial fulfillment of the requirement for the


award of the degree of

Bachelor of Technology
in
Computer Science & Engineering

Submitted to: Submitted by:


Prof. Sunila Godara Jayant Kumar
Deptt. of CSE 200010130050
GJUS&T, Hisar B.Tech (CSE) – 8th Sem

Department of Computer Science & Engineering


Guru Jambheshwar University of Science & Technology, Hisar
‘A’ Grade NAAC Accredited
2017-2021
CANDIDATE’S DECLARATION

I, hereby declare that the project work entitled “Sign Language Detection Using the Computer
Vision” is an authentic work carried out by me under the guidance of Prof. Sunila, Department of
Computer Science & Engineering in partial fulfillment of the requirement for the award of the
degree of Bachelor of Technology Computer Science and this has not been submitted anywhere
else for any other degree.

Date: Signature

Jayant Kumar

200010130050

2
CERTIFICATE

This is to certify that Jayant Kumar (200010130050) is a student of B.Tech (CSE), Department
of Computer Science & Engineering, Guru Jambheshwar University of Science & Technology,
Hisar has completed the project entitled “Sign Language Detection Using the Computer
Vision”.

Prof. Sunila Godara


Deptt. of CSE
GJUS&T, Hisar

3
List of Figures

4
List of Tables

5
Abstract

This project provides a revolutionary method for sign language Detection using Computer Vision
strategies, Media Pipe, Hand gesture detection, and Google Trainable Machine.

Sign Language is a Vital verbal exchange tool for the deaf and difficult–of–hearing network, but
it remains in large part inaccessible to those unexpected with it

The approach addresses the hole via leveraging advanced technologies to appropriately interpret
and translate sign language gestures into textual content or speech in real-time

I employ MediaPipe, a sturdy framework advanced by way of Google Trainable Machine, for
real-time hand tracking and gesture popularity. Media Pipe’s hand detection module identifies
key landmarks on the arms, bearing in mind specific tracking of finger positions and movements.

These hand landmarks are then processed to locate specific sign language gestures.

To enhance the accuracy and reliability of our machine, we combine Google Trainable devices.
A Machine Learning, a machine learning platform that allows the advent of custom fashions
without massive coding.

Through training models on a comprehensive dataset of sign language gestures, the device learns
to recognize and differentiate among diverse signs and sign with high accuracy.

The proposed system was evaluated via extensive testing, showing promising results in terms of
accuracy, speed, and user-friendliness.

The demonstrates the capacity for real-time sign language interpretation, presenting a sensible
answer for bridging communication boundaries and fostering inclusivity for the deaf and
difficult-of-listening to network.

6
Contents

Sign Language Detection Using the Computer Vision

Page No.

1. Introduction
1.1 Problem Definition
1.2 Objectives
2. Existing System
3. Problems in the existing system
4. Proposed System
5. Advantages of the proposed system
6. Software requirement specification document
7. Design of the proposed system
8. Implementation (Coding)
9. Testing
10.User’s Manual
11.Conclusions
12.References/ Bibliography
13.Plagiarism Report

7
1. Introduction
Deaf and hearing people mainly use sign language to communicate through hand gestures.
People who are not familiar with it often cannot access it, creating communication barriers. By
enabling real-world sign language translation and recognition, machine learning (ML) paves the
way for innovative solutions to bridge this gap.

Hand gestures and sign language

American Sign Language (ASL) is a natural language, complete with its own syntax, grammar,
and lexis. They use hand gestures, facial expressions, and body movements to convey meaning.
The main components of this hand gesture are:

1. Hand shape: Configuration of fingers and hands.

2. Palm direction: palm facing direction.

3. Movement: Hand movement including speed and direction.

4. Location: signs are done according to the body.

5. Non-manual signals: Facial expressions and body postures that provide additional grammatical
information.

Fig. 1 (a) Sign Language Fig. 1 (b) Description of


hand Gesture
8
The role of machine learning in speech-language recognition

Machine learning, part of artificial intelligence (AI), involves algorithms to recognize patterns
and make decisions based on data. In the context of speech recognition, ML models can be
trained to recognize and interpret hand movements from visual data.

1 (c) Descriptions of Different


hand Gestures

Basic technology and framework

1. Computer Vision: This technology allows computers to interpret and process visual
information from the world. For speech recognition, computer vision algorithms analyze video
footage to detect and track hand movements and posture.

2. MediaPipe: Developed by Google, MediaPipe is a framework that provides real-time hand


tracking capabilities. It detects important landmarks in the hand that allows accurate tracking of
finger positions and movements. This information is necessary for accurate hand gesture
recognition.

3. Google Trainable Machine: this platform allows you to create custom machine learning
models without extensive coding. By training models on a large database of speech cues, the
system learns to recognize individual cues with high accuracy.

9
The process of recognizing sign language using machine learning

1. Collect data: Collect a large collection of data from video clips and different signals. This data
set should include different hand shapes, gestures and directions.

2. Preprocessing: Use computer vision techniques for preprocessing video data. This includes
defining the hand, segmenting the hand region, and identifying key landmarks.

3. Feature extraction: Extracting features such as hand position, motion trajectory, and
orientation from pre-processed data. These features are important inputs for machine learning
models.

4. Learning Model: Use extracted features to train machine learning models. Common models
include convolutional neural networks (CNN) for image data and recurrent neural networks
(RNN) for sequential data.

5. Real-time recognition: Applying the learning model in a real-time system. The process model
records live video, detects hand gestures, and translates it into text or speech.

Fig 1 (d) Sample of Sign


Language

10
1.1 Problem Definition

Background

Utilizing hand gestures, facial expressions, and body moves. despite its significance, sign
language remains in large part inaccessible to individuals who do not recognize it, developing a
good-sized communique barrier. traditional methods of sign language interpretation, together
with human interpreters, aren't continually to be had and can be expensive.

There's a pressing want for an automatic machine that could correctly and successfully translate
sign language into text or speech in real-time.

Problem Declaration

The primary purpose is to increase an automatic machine for real-time signal language detection
and translation using system studying. This device must be able to accurately apprehend and
interpret hand gestures corresponding to distinct signal language words and phrases, offering a
textual content or speech output that can be without difficulty understood via non-signal
language customers.

1. Complexity of Hand Gestures

 Wide Range of Gestures: Sign language consists of numerous handshapes, movements,


orientations, and locations, each with specific meanings. Recognizing these varied
gestures accurately is challenging because the system must discern subtle differences
between similar signs.
 Intricacies of Movement: The dynamic nature of hand movements, including speed,
trajectory, and flow, adds another layer of complexity. Gestures can involve complex
sequences of movements that need precise interpretation.

11
2. Variability

 Different Sign Languages: Each sign language, such as American Sign Language (ASL)
and British Sign Language (BSL), has its unique set of gestures and grammatical rules.
This necessitates the system to be versatile and adaptable to multiple sign languages.
 Individual Differences: Variations in signing styles, speeds, and personal nuances
among individuals can lead to inconsistencies. Additionally, factors like hand size, shape,
and flexibility can affect how gestures are performed and perceived.
 Environmental Factors: Changes in lighting conditions, background noise, and visual
distractions can significantly impact the system’s accuracy. The system must be robust
enough to handle different environments and conditions.

3. Real-time Processing

 High Computational Demand: Processing video feeds in real time requires efficient
algorithms and powerful hardware. The system must analyze frames quickly to provide
instantaneous translations without lag.
 Optimization: Balancing accuracy and speed is critical. The algorithms must be
optimized to ensure that they are fast enough for real-time use while maintaining high
accuracy levels.

4. Non-manual Signals

 Facial Expressions: Facial expressions play a crucial role in sign language, conveying
emotions, questions, negations, and other grammatical elements. Capturing and
interpreting these expressions accurately is essential but challenging.

5. Training Data

 Diverse Dataset: A large and diverse dataset is required to train machine learning models
effectively. The dataset must include various sign languages, gestures, and environmental
conditions to ensure the model’s robustness.
 Data Collection and Annotation: Collecting and annotating a comprehensive dataset is
time-consuming and resource-intensive. Ensuring the quality and accuracy of the data is
crucial for effective model training.

12
 Balancing the Dataset: The dataset should be balanced to include an equal representation
of different gestures and non-manual signals to prevent bias in the trained models.

1.2 Objectives

1. Accurate Hand Gesture Detection

Develop sturdy computer imaginative and prescient algorithms to correctly hit upon hand
movements and positions.

The intention is to attain high precision in figuring out diverse handshapes, orientations, moves,
and locations.

2. Comprehensive Feature Extraction

Extract distinctive capabilities from hand gestures, inclusive of handshape, motion trajectory,
orientation, and region.

Goal is to make certain that the extracted functions comprehensively capture the nuances of
different signal language gestures for correct reputation.

3. Effective Model Schooling

The goal is to educate machine learning on the usage of a complete and numerous dataset of
sign language gestures.

The intention is to appoint advanced techniques which include convolutional neural networks
(CNNs) and recurrent neural networks (RNNs) to achieve excessive popularity accuracy.

4. Actual-Time Translation

Objective is to broaden a system able to process video feeds and translating detected gestures
into text or speech in real-time.

The Aim is to optimize the system for speed and efficiency to make sure seamless and on-the-
spot user enjoy.

5. Consumer-Friendly Interface
13
The objective is to design an intuitive and accessible consumer interface that enables easy
interaction for both sign language customers and non-users.

The goal is to create a clear and responsive interface that correctly presents translations of
gestures.

6. Substantial Checking Out And Validation

The goal is to behaviour rigorous trying out and validation of the machine under various
situations, such as extraordinary lighting, backgrounds, and consumer variations.

Aim is make sure that the gadget continues high accuracy and overall performance throughout
diverse scenarios and real-world environments.

7. Accessibility And Inclusivity

The objective is to create awareness of creating an answer that enhances conversation


accessibility for the deaf and difficult-of-hearing community.

The purpose is to bridge the conversation gap between signal language customers and people
surprised by signal language, fostering greater inclusivity.

8. Scalability And Destiny Enhancements

The goal is to create a layout for the machine to be scalable and adaptable for destiny upgrades
and expansions.

The aim is to plan for non-stop updates and enhancements based totally on user feedback and
technological advancements, ensuring lengthy-term relevance and effectiveness.

By accomplishing these objectives, the venture pursuits to expand a relatively effective and
inclusive sign language detection device that leverages the electricity of gadget-gaining
knowledge to facilitate actual-time communication and foster extra inclusivity for the deaf and
hard-of-hearing community.

14
2. Existing System
The sign language Detection system that already exists is basic system that use the machine
learning system, only work detect by giving by input it does not run that efficiently on a real time
basis.

Let understand it by step how it works: -

Data Processing, Training, Classify Gesture. The block diagram is simplified in detail to abstract
some of the minutiae:

• Data Processing: The load data.py script contains functions to load the Raw Image Data and
save the image data as numpy arrays into file storage. The process data.py script will load the
image data from data.npy and preprocess the image by resizing/rescaling the image and applying
filters and ZCA whitening to enhance features. During training the processed image data was
split into training, validation, and testing data and written to storage. Training also involves a
load dataset.py script that loads the relevant data split into a Dataset class. For use of the trained
model in classifying gestures, an individual image is loaded and processed from the filesystem.

• Training: The training loop for the model is contained in train model.py. The model is trained
with hyperparameters obtained from a config file that lists the learning rate, batch size, image
filtering, and number of epochs. The configuration used to train the model is saved along with the
model architecture for future evaluation and tweaking for improved results. Within the training
loop, the training and validation datasets are loaded as Dataloaders and the model is trained using
Adam Optimizer with Cross Entropy Loss. The model is evaluated every epoch on the validation
set and the model with best validation accuracy is saved to storage for further evaluation and use.
Upon finishing training, the training and validation error and loss is saved to the disk, along with
a plot of error and loss over training.

• Classify Gesture: After a model has been trained, it can be used to classify a new ASL gesture
that is available as a file on the filesystem. The user inputs the filepath of the gesture image and
the test data.py script will pass the filepath to process data.py to load and preprocess the file the
same way as the model has been trained.

15
Fig 2 (a) Flowchart of Process

 Data Collection: - The primary source of data for this project was the compiled dataset of
American Sign Language (ASL) called the ASL Alphabet from Kaggle. The dataset is
comprised of images. 26 for the letters A - Z and 3 for space, delete, and nothing. This
data is solely of the user Akash gesturing in ASL, with the images taken from his laptop’s
webcam. These photos were then cropped, rescaled, and labeled for use. (a) ASL letter A
(b) ASL letter E (c) ASL letter H (d) ASL letter Y Figure 2: Examples of images from the
Kaggle dataset used for training.test sets of images were taken with a webcam under
different lighting conditions, backgrounds, and use of dominant/non-dominant hand.
These images were then cropped and preprocessed.

16
Fig 2(b) Showing hands figures

 Data Pre-processing The data preprocessing was done using the PILLOW library, an
image processing library, and sklearn.decomposition library, which is useful for its matrix
optimization and decomposition functionality.

Fig 2 (c) Sample of data used processing

 Image Enhancement: A combination of brightness, contrast, sharpness, and color


enhancement was used on the images. For example, the contrast and brightness were
changed such that fingers could be distinguished when the image was very dark.

 Edge Enhancement: Edge enhancement is an image filtering techniques that makes edges
more defined. This is achieved by the increase of contrast in a local region of the image
that is detected as an edge. This has the effect of making the border of the hand and
fingers, versus the background, much more clear and distinct. This can potentially help
the neural network identify the hand and its boundaries.

 Image Whitening: ZCA, or image whitening, is a technique that uses the singular value
decomposition of a matrix. This algorithm decorrelates the data, and removes the
redundant, or obvious, information out of the data. This allows for the neural network to
look for more complex and sophisticated relationships, and to uncover the underlying
structure of the patterns it is being trained on. The covariance matrix of the image is set to
identity, and the mean to zero.

17
Fig 2 (d) After processing the image of hand
Machine Learning Model

Overall Structure The model used in this classification task is a fairly basic
implementation of a Convolutional Neural Network (CNN). As the project requires
classification of images, a CNN is the go-to architecture. The basis for our model design
came from Using Deep Convolutional Networks for Gesture Recognition in American
Sign Language paper that accomplished a similar ASL Gesture Classification task [4].

This model consisted of convolutional blocks containing two 2D Convolutional Layers


with Re LU activation, followed by Max Pooling and Dropout layers.

These convolutional blocks are repeated 3 times and followed by Fully Connected layers
that eventually classify into the required categories. The kernel sizes are maintained at 3
X 3 throughout the model. omitted the dropout layers on the fully connected layers at first
to allow for faster training and to establish a baseline without dropout

18
Fig 2(e) Process of Model

Model Architecture as implemented in Using Deep Convolutional Networks for Gesture


Recognition in American Sign Language. Model to compare with the model in the paper.
This model was designed to be trained faster and to establish a baseline for problem
complexity. This smaller model was built with only one “block” of convolutional layers
consisting of two convolutional layers with variable kernel sizes progressing from 5 X 5
to 10 X 10, ReLU activation, and the usual Max Pooling and Dropout. This fed into three
fully connected layers which output into the 29 classes of letters. The variation of the
kernel sizes was motivated by our dataset including the background, whereas the paper
preprocessed their data to remove the background. The design followed the thinking that
the first layer with a smaller kernel would capture smaller features such as hand outline,
finger edges, and shadows. The larger kernel hopefully captures combinations of the
smaller features like finger crossing, angles, hand location, etc.

19
\

Fig 2 (f) Process of Model

20
3. Problems in the existing system

In-Depth Analysis of Sign Language Detection Using Machine Learning

Table of Contents

3.1. Introduction

3.1.1 Overview

3.1.2 Importance of Accurate Sign Language Recognition

3.1.3 Scope of the Discussion

3.2. Data Challenges

3.2.1 Data Scarcity

3.2.2 Data Variability

3.2.3 Class Imbalance

3.3. Model Complexity

3.3.1 Gesture Similarity

 Problem

3.3.2 Dynamic Gestures

 Problem

3.4. Feature Extraction

3.4.1 Hand and Finger Detection

 Problem

3.4.2 Pose Estimation

 Problem

21
3.5. Model Training

3.5.1 Overfitting

 Problem

3.5.2 Computational Resources

 Problem

3.6. Real-time Implementation

3.6.1 Latency

 Problem

3.6.2 Hardware Limitations

 Problem

3.7. Cultural and Linguistic Variations

3.7.1 Sign Language Variants

 Problem

3.7.2 Contextual Understanding

 Problem

22
3.1. Introduction

3.1.1 Overview

Sign language detection is a complex task that involves interpreting hand gestures, facial
expressions, and body language to translate them into verbal language or text. With
advancements in machine learning and computer vision, researchers are developing systems
that can recognize and interpret sign language from images. These systems hold great
potential for improving accessibility and communication for the deaf and hard-of-hearing
communities.

3.1.2 Importance of Accurate Sign Language Recognition

Accurate sign language recognition systems can transform various sectors by making them
more inclusive. In education, such systems can assist teachers and students by providing real-
time translation. In healthcare, they can enable better communication between patients and
medical professionals. In public services, they can facilitate interactions in places like banks,
government offices, and transportation hubs.

3.1.3 Scope of the Discussion

This document aims to provide a comprehensive exploration of the challenges and solutions
in developing machine learning models for sign language detection from photos. It will cover
data challenges, model complexity, feature extraction, model training, real-time
implementation, cultural and linguistic variations, evaluation and benchmarking, case studies,
and future directions.

3.2 Data Challenges

3. 2.1 Data Scarcity

 Problem - High-quality, labeled datasets are crucial for training machine learning models.
However, the availability of such datasets for sign language detection is limited.
Collecting these datasets requires capturing a wide range of signs performed by various
individuals under different conditions. This process is resource-intensive and time-
consuming. Existing datasets, such as RWTH-PHOENIX-Weather 2014 for German Sign
23
Language and the American Sign Language (ASL) dataset, are valuable but insufficient
for comprehensive model training.

3.2.2 Data Variability

 Problem - Variations in lighting, background, camera angles, and individual differences


in hand shapes, sizes, and skin tones add complexity to the dataset. These variations can
affect the consistency and accuracy of the models. For instance, the same sign performed
under different lighting conditions or from different angles may be interpreted differently
by the model.

3.2.3 Class Imbalance

 Problem - In many datasets, some signs are more frequently used and thus more
represented than others, leading to class imbalance. This imbalance can cause models to
perform poorly on underrepresented signs. For instance, common signs like "hello" or
"thank you" might be well-represented, while less common signs might be scarce,
resulting in biased model performance.

3.3. Model Complexity

3.3.1 Gesture Similarity

 Problem - Many signs have subtle differences, making it difficult for models to
distinguish between them. For example, the signs for “I love you” and “rock on” in ASL
are very similar, involving slight differences in finger positioning. Detecting these
nuances requires highly sensitive models that can capture fine-grained details.

3.3.2 Dynamic Gestures

 Problem - Sign language includes both static poses and dynamic gestures involving
movement. Capturing and interpreting these movements from a single image is
challenging, as it lacks temporal information. Dynamic gestures, such as those involving
24
movement from one hand position to another, require understanding the sequence of
frames.

3.4 Feature Extraction

3.4.1 Hand and Finger Detection

 Problem - Accurately detecting and isolating hands and fingers in images is complex,
especially in cluttered or noisy backgrounds. Misidentification can lead to poor feature
extraction and inaccurate recognition. For example, overlapping hands or occlusions by
objects can hinder accurate detection.

3.5. Model Training

3.5.1 Overfitting

 Problem - when a machine learning model learns the training data too well, including the
noise and outliers, which leads to poor generalization on new, unseen data. This is a
common issue in deep learning, especially with complex models and limited data.

Causes -

 High model complexity: Deep neural networks with numerous parameters can fit the training data
perfectly but fail to generalize.
 Limited data: Insufficient training data can cause the model to memorize rather than learn
patterns.
 Lack of regularization: Absence of techniques to penalize complexity can lead to overfitting.

3.5.2 Computational Resources

 Problems - Training deep learning models for sign language detection is computationally
intensive, requiring substantial processing power and memory. Limited access to
powerful GPUs or TPUs can hinder the training process and slow down research
progress.

25
3.6. Real-time Implementation

3.6.1 Latency

 Problem - For practical applications, sign language detection systems need to operate in
real-time, meaning they must process and interpret signs with minimal delay. High
latency can make these systems less effective and frustrating to use.

3.6.2 Hardware Limitations


 Problem - Deploying sign language detection models on devices with limited
computational resources, such as mobile phones, embedded systems, or IoT devices,
presents a unique set of challenges

3.7 Cultural and Linguistic Variations

3.7.1 Sign Language Variants

Problem - Sign languages are not universal; different regions and communities use distinct
versions of sign language, each with its own vocabulary, grammar, and nuances. This diversity
presents a significant challenge for developing models that can accurately recognize and interpret
signs across different languages and dialects.

Challenges:

 Dataset diversity: Collecting comprehensive datasets that cover the variations in different sign
languages.
 Model generalization: Ensuring models can generalize across different dialects and regional
variations.
 Community involvement: Engaging with diverse communities to gather data and validate models.

26
4. Proposed System

27

You might also like