0% found this document useful (0 votes)
15 views25 pages

Yoga Pose Detection and Classification U

The document discusses a system for yoga pose detection and classification using deep learning techniques to assist individuals in improving their yoga practice. It highlights the challenges of self-learning yoga poses and the importance of accurate posture assessment, which can be achieved through advanced methods like Convolutional Neural Networks (CNNs) and PoseNet. The study aims to provide a digital solution for users to receive feedback on their poses by comparing them with expert models, thereby enhancing their self-learning experience in yoga.

Uploaded by

Joumene Mhamdi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views25 pages

Yoga Pose Detection and Classification U

The document discusses a system for yoga pose detection and classification using deep learning techniques to assist individuals in improving their yoga practice. It highlights the challenges of self-learning yoga poses and the importance of accurate posture assessment, which can be achieved through advanced methods like Convolutional Neural Networks (CNNs) and PoseNet. The study aims to provide a digital solution for users to receive feedback on their poses by comparing them with expert models, thereby enhancing their self-learning experience in yoga.

Uploaded by

Joumene Mhamdi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

International Journal of Scientific Research in Computer Science, Engineering and Information Technology

ISSN : 2456-3307 (www.ijsrcseit.com)


doi : https://doi.org/10.32628/CSEIT206623
Yoga Pose Detection and Classification Using Deep Learning
Deepak Kumar, Anurag Sinha
Department of Information Technology, Research Scholar, Amity University, Jharkhand, India

ABSTRACT

Article Info Yoga is an ancient science and discipline originated in India 5000 years ago. It
Volume 6, Issue 6 is used to bring harmony to both body and mind with the help of asana,
Page Number: 160-184 meditation and various other breathing techniques It bring peace to the mind.
Publication Issue : Due to increase of stress in the modern lifestyle, yoga has become popular
November-December-2020 throughout the world. There are various ways through which one can learn
yoga. Yoga can be learnt by attending classes at a yoga centre or through home
tutoring. It can also be self-learnt with the help of books and videos. Most
people prefer self-learning but it is hard for them to find incorrect parts of their
yoga poses by themselves. Using the system, the user can select the pose that
he/she wishes to practice. He/she can then upload a photo of themselves doing
the pose. The pose of the user is compared with the pose of the expert and
Article History difference in angles of various body joints is calculated. Based on thisdifference
Accepted : 15 Nov 2020 of angles feedback is provided to the user so that he/she can improve the pose.
Published : 28 Nov 2020 Keywords : Pose, Self-Learning, Posenet, Deep Learning, Pose Classification.

I. INTRODUCTION presently celebrated overall due to its numerous


profound, physical and mental benefits [2].
Human posture assessment is a difficult issue in the
control of PC vision. It manages confinement of The issue with yoga anyway is that, much the same as
human joints in a picture or video to shape a skeletal some other exercise, it is of most extreme significance
portrayal. To consequently recognize an individual's to rehearse it accurately as any erroneous stance
posture in a picture is a troublesome errand as it during a yoga meeting can be ineffective and
relies upon various perspectives, for example, scale conceivably inconvenient. This prompts the need of
and goal of the picture, enlightenment variety, having a teacher to manage the meeting and right the
foundation mess, dress varieties, environmental person's stance. Since not all clients approach or
factors, and connection of people with the assets to a teacher, a computerized reasoning based
environmental factors [1]. An utilization of posture application may be utilized to recognize yoga
assessment which has pulled in numerous analysts in presents and give customized input to assist people
this field is practice and wellness. One type of with improving their structure [2].
activity with multifaceted stances is yoga which is a
deep rooted practice that begun in India however is

Copyright: © the author(s), publisher and licensee Technoscience Academy. This is an open-access article distributed
under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-
commercial use, distribution, and reproduction in any medium, provided the original work is properly cited
160
Deepak Kumar et al Int J Sci Res CSE & IT, November-December-2020; 6 (6) : 160-184

Lately, human posture assessment has profited II. HISTORY


extraordinarily from profound learning and gigantic
gains in execution have been accomplished [3]. People are inclined to musculoskeletal problems with
Profound learning approaches give a more clear maturing and mishaps [5]. So as to forestall this a few,
method of planning the structure as opposed to type of actual exercise is required. Yoga, which is a
managing the conditions between structures physical and otherworldly work out, has increased
physically. [4] utilized profound figuring out how to colossal centrality in the network of clinical scientists.
distinguish 5 exercise presents: pull up, swiss ball Yoga has the capacity to totally fix illnesses with no
hamstring twist, push up, cycling and strolling. prescriptions and improve physical and mental
Nonetheless, utilizing this technique for yoga wellbeing [6]. A tremendous assemblage of writing
presents is a moderately more up to date application on the clinical uses of yoga has been created which
[2]. incorporates positive self-perception mediation, heart
restoration, psychological sickness and so forth [6].
This undertaking centers around investigating the Yoga contains different asanas which speak to actual
various methodologies for yoga present order and static stances. The utilization of posture assessment
looks to accomplish knowledge into the for yoga is trying as it includes complex setup of
accompanying: What is present assessment? What is stances. Moreover, some best in class strategies
profound realizing? How can profound learning be neglect to perform well when the asana includes even
applied to yoga present order continuously? This task body act or then again when both the legs cover one
utilizes references from meeting procedures, another. Subsequently, the need to build up a strong
distributed papers, specialized reports and diaries. Fig. model which can help advocate self-taught yoga
1 gives a graphical review of points this paper covers. frameworks emerges
The main part of the venture discusses the history
and significance of yoga. The subsequent segment III. HUMAN POSE ESTIMATION
discusses present assessment and clarifies various
kinds of posture assessment strategies in detail and Human stance acknowledgment has made colossal
goes one level further to clarify discriminative headways in the previous years. It has advanced from
strategies – learning based (profound learning) and 2D to 3D present assessment and from single
model. Diverse posture extraction strategies are then individual to multi individual posture assessment. [16]
talked about alongside profound learning based employments present assessment to fabricate an AI
models - Convolutional Neural Organizations (CNNs) application that identifies shoplifters while [17]
and Recurrent Neural Networks (RNNs). utilizes a solitary RGB camera to catch 3D stances of
numerous individuals continuously. Human posture
assessment calculations can be broadly coordinated in
two different ways. Calculations prototyping
assessment of human stances as a mathematical
estimation are named generative strategies while
calculations demonstrating human posture
assessment as a picture preparing issue are named
discriminative strategies [7]. Another method of
grouping these calculations depends on their strategy
for or king. Calculations beginning from a more

Volume 6, Issue 6, November-December-2020 | http://ijsrcseit.com


161
Deepak Kumar et al Int J Sci Res CSE & IT, November-December-2020; 6 (6) : 160-184

significant level speculation and descending are methodology doesn't need a body model
called top-down strategies, while calculations that unequivocally nor any earlier marked portions of the
start with pixels and move upwards are rung base body in the picture. It reestablishes the posture
strategies [8]. utilizing non-straight relapse dependent on the shape
descriptor vectors brought naturally from outlines of
A. GENERATIVE pictures. It utilizes Relevance Vector Machine (RVM)
Generative methods give a procedure to foresee the regressors and damped least squares for relapse [11].
highlights from a given posture speculation. They The strategy, however expanding the precision by
start with instating the stance of the human body and multiple times, isn't sufficiently exact, as there are a
venture it to the picture plane. Changes are made to few examples of erroneous postures and results
make the extended picture and current picture indicating critical fleeting jitter. Discriminative
perceptions consistent. Generative based strategies are further ordered into learning
methodologies offer simple speculation because of techniques and model strategies [7].
less requirement of a preparing present dataset [9]. I. LEARNING BASED – DEEP LEARNING:
Nonetheless, because of the high dimensional One significant learning-based technique is profound
projection space search, this technique isn't realizing which is based upon Artificial Neural
considered computationally plausible, and is in this Organizations (ANNs). ANN is similar to the human
manner more slow when contrasted with cerebrum where the units in an ANN speak to the
discriminative techniques. [10] portrays a generative neurons in the human mind, and loads speak to the
Bayesian technique to follow 3D fragmented human quality of association between neurons.Profound
body figures in recordings. This is a probabilistic learning gives a start to finish design that permits
technique which comprises of a generative model for programmed learning of key data from pictures. One
picture appearance, an underlying likelihood famous profound learning model which has been
dissemination over joint points and represent that generally utilized for present assessment is
speaks to development of people and a strong Convolutional Neural Network (CNN) which will be
probability work. Despite the fact that the strategy talked about later. [20] have added to the exploration
can follow people in obscure convoluted foundations, by utilizing CNNs and stacked auto-encoder
it faces the danger of in the long run forgetting about calculations (SAE) for distinguishing yoga stances and
the object. Indian old style move structures. In any case, their
presentation assessment is done distinctly on pictures
B. DISCRIMINATIVE and not on recordings.
In opposition to generative techniques,
discriminative strategies start with the proof of the II. EXEMPLAR METHODS:
picture and get familiar with a strategy to In model techniques, present assessment depends on
demonstrate the connection between the human a special arrangement of postures with their equal
postures and proof on the premise of preparing portrayals [7]. Characterization calculations, for
information. Model testing in discriminative example, arbitrary timberlands and randomized trees
strategies is much quicker instead of generative are strong and quick enough to deal with this.
strategies because of the hunt in an obliged space Irregular woods is comprised of different randomized
rather than a high dimensional include space [7]. [11] choice trees and is henceforth called a group classifier.
investigates a discriminative based learning technique It comprises of non-terminal hubs which have a
to get 3D human posture from outlines. This choice capacity to foresee the similitudes in pictures.

Volume 6, Issue 6, November-December-2020 | http://ijsrcseit.com


162
Deepak Kumar et al Int J Sci Res CSE & IT, November-December-2020; 6 (6) : 160-184

Fig. 8 gives a model. [19] utilized an improved variant assessment and 68.9 for present following, leaving a
of irregular woodlands which contained two levels of great deal of extension for development.
arbitrary backwoods. The first layer of the tree went
about as a discriminative classifier to arrange body D. BOTTOM – UP:
parts and passed the arrangement results to the This methodology includes discovery of human
second layer which anticipated joint areas in the keypoints from likely subjects and putting together
body. Another approach like irregular timberlands is them into human appendages utilizing a few
Hough backwoods [7] which comprises of choice information affiliation instruments. The expense of
woods blends, where the terminal hub in each tree is calculation in this technique doesn't rely upon the
either a relapse or arrangement hub. Upgraded quantity of human subjects in the pictures. In this
Hough trees have a sections objective ("PARTS") manner, it gives an awesome remuneration among
which is an advanced target based on discrete data cost and exactness. A few studies allude to base up
gain. techniques as discriminative strategies [7]. [12]
proposes a novel methodology that consolidates
C. TOP- DOWN customary base up and top-down strategies for multi-
Most examinations allude to generative strategies as individual posture assessment. The feed sending to
top-down techniques [7]. The top-down the organization is done in a base up style while
methodology acquires keypoints by utilizing a parsing of postures alongside bouncing box
module to identify human subjects to which a limitations is acted in a top-down design. The
posture assessor can be applied. The essential bit of highlights from the picture are connections of
leeway of top-down techniques is their capacity to network between the joints. This emaining
separate the undertaking into various more modest organization is only ResNet50, which is a profound
assignments. These more modest errands incorporate neural organization design. In spite of the fact that
distinguishing the item followed by present this examination doesn't zero in on arranging the
assessment. For this, the identifier should be ground- stances, it shows how profound learning designs like
breaking enough to recognize hard or relatively more ResNet50 can be utilized so as to make human
modest items with the goal that the exhibition of the posture assessment exact.
posture assessor is improved. [8] utilizes a top down
way to deal with articulate human posture assessment IV. KEYPOINT DETECTION METHODS
and following. Their top down methodology includes
three segments – a human applicant indicator, a A. OPENPOSE:
solitary individual posture assessor and a human OpenPose is a multi-individual continuous keypoint
posture tracker [8]. An overall article identifier is location which acquired a transformation the field of
picked to recognize human applicants after which a posture assessment. It was developed in Carnegie
fell pyramid tracker is utilized to perceive the Mellon University (CMU) by the Perceptual
identical human posture and finally, a stream based Registering Lab[13]. It utilizes CNN based
posture tracker is utilized to relegate a particular and engineering to recognize facial, hand and foot
transiently reliable id to every human possibility for keypoints of a human body from single pictures.
multi present following. Despite the fact that this OpenPose recognizes human body joints utilizing a
examination builds up a secluded framework to RGB camera. OpenPose keypoints incorporate eyes,
human posture assessment and following, they ears, neck, nose, elbows, shoulders, knees, wrists,
accomplish a normal accuracy of just 69.4 for present lower legs and hips. It presents the outcomes

Volume 6, Issue 6, November-December-2020 | http://ijsrcseit.com


163
Deepak Kumar et al Int J Sci Res CSE & IT, November-December-2020; 6 (6) : 160-184

acquired by handling contributions from a camera


continuously or pre-recorded recordings or static
pictures as 18 basic keypoints. Along these lines, it
discovers its utilization in a assortment of utilizations
going from sports, reconnaissance, action location to
yoga present acknowledgment. The work proposed in
[18] utilizes OpenPose for introductory keypoint
recognizable proof followed by CNN for
B. POSENET:
characterization of yoga presents. Notwithstanding,
PoseNet is another profound learning structure like
they accomplish a precision of just 78% which could
OpenPose which is utilized for ID of human postures
be because of the restricted dataset they utilized or
in pictures or video successions by distinguishing
design and hyperparameter tuning of their CNN
joint areas in a human body. These joint areas or
model.
keypoints are listed by "Part ID" which is a certainty
score whose worth lies in the scope of 0.0 and 1.0
The primary stage in OpenPose is identifying
with 1.0 being the best. The PoseNet model's
keypoints of each individual in the picture which is
execution shifts relying upon the gadget and yield
followed by relegating parts to each particular person.
step [14]. The PoseNet model is invariant to the size
Fig. 2 portrays the engineering of the OpenPose
of the picture, consequently it can anticipate present
model [13]. OpenPose network begins with
situations in the size of the genuine picture regardless
extraction of highlights from the picture utilizing the
of whether the picture has been downscaled.
underlying layers (VGG - 19 as appeared in Fig. 2).
These highlights are then passed to two
convolutional layer branches which run in equal. A
forecast of 18 certainty maps, which speaks to explicit
portions of the human body, is made by the principal
branch. Then again, 38 Part Affinity Fields (PAF)
which indicate the affiliation degree between parts is
anticipated continuously branch. More stages are
utilized to make refinement to the expectations C. PIFPAF:
produced using the past branch. Bipartite charts are PifPaf is another technique dependent on the base up
framed between various parts utilizing part certainty approach for 2D multi-individual human posture
maps. The connections which are more fragile in assessment. It utilizes a Part Intensity Field (PIF) for
these diagrams are taken out utilizing the PAF body part limitation and a Part Association Field
esteems. With these means, human skeletons are (PAF) for relationship of body parts to shape full
assessed for each individual in the edge or picture. human stances [15]. The model beats other
techniques regarding a lower goal and better
execution in stuffed places principally due to the
accompanying: (a) fine data encoded in a more
current composite field PAF, (b) the determination of
Laplace misfortune that incorporates an assessment of
vulnerability. The model engineering settles upon a
totally convolutional sans box plan [15].

Volume 6, Issue 6, November-December-2020 | http://ijsrcseit.com


164
Deepak Kumar et al Int J Sci Res CSE & IT, November-December-2020; 6 (6) : 160-184

Fig. 4 speaks to the engineering of PifPaf [15]. The of the human joint areas from the pictures and
information picture is of size (H, W). It has the RGB afterward preparing the profound learning model on
channels which is appeared by 'x3'. The encoder these keypoints. The following are some essential
depends on neural organizations, and it produces the profound learning models utilized for
PIF field with 17 x 5 channels and PAF field with characterization issues.
19×7 channels. '//2' speaks to an activity with steps of
2. The PIF and PAF fields are changed over by the A. MULTILAYER PERCEPTRON (MLP)
decoder into present directions which have 17 joints MLP is an old style neural organization that has one
each. Each joint is a 2D portrayal and has X and Y info and one yield layer. The transitional layers
organizes alongside a certainty score. between the info and yield layer are known as
concealed layers. There can be at least one shrouded
layers. MLPs structure a completely associated
network as each hub in one layer has an association
to each hub in another layer. A completely associated
network is an establishment for profound learning.
MLP is well known for directed characterization
This model engineering is a ResNet based where the information is relegated a mark or class.
organization. The certainty, careful area and joint [21] employments MLP for human posture order by
size is anticipated by one of the head networks which separating keypoints from low goal pictures utilizing
is called PIF, and the relationship between various Kinect sensor.
parts are anticipated by the other head network
which is PAF. The strategy is thus called PifPaf.

V. POSE CLASSIFICATION USING DEEP


LEARNING
Profound learning is generally utilized for picture
order undertakings wherein the model takes input as
pictures and yields a forecast. Profound learning
calculations utilize neural organizations to decide the
B. Repetitive NEURAL NETWORK (RNN)
association between the information and yield. For
present assessment issues, the picture with posture of
RNNs are neural organization designs that are
people is taken as information and the profound
utilized for grouping expectation issues. Grouping
learning model attempts to adapt effectively the
expectation issues can be one to many, numerous to
various postures in order to precisely group them. As
one, or numerous to many. In RNNs, the past
one can figure, this could be a computationally costly
information of a neuron is saved which helps in
assignment if the quantity of pictures is enormous.
dealing with the successive information. Thus, the
Additionally, as we need precise outcomes we would
setting is protected, and yield is created considering
not need to settle on the nature of the pictures as that
the recently learned information. RNNs are most
could influence the highlights removed by the model.
regularly utilized for normal language handling (NLP)
Consequently, in this venture we propose utilizing
issues where the information is normally
OpenPose (a pretrained model) to separate keypoints
demonstrated in groupings.

Volume 6, Issue 6, November-December-2020 | http://ijsrcseit.com


165
Deepak Kumar et al Int J Sci Res CSE & IT, November-December-2020; 6 (6) : 160-184

Be that as it may, in action acknowledgment or default conduct. The key thought which makes this
posture arrangement undertakings as well, there is a conceivable is cell state. A cell state permits unaltered
reliance between the recently performed activity and data stream. It tends to be considered as a transport
the following activity. In the event of yoga also, the line. LSTMs can add and kill information from the
unique situation or on the other hand data of cell state utilizing administrative structures known as
introductory or go-between presents is significant in entryways. These exceptional entryways consider
anticipating the last posture. Yoga can in this way be alternatively letting the data through. LSTM utilizes
considered as a succession of postures. This settles on three entryways, specifically information, refresh and
RNNs an appropriate decision for yoga present overlook. A LSTM can accordingly specifically
arrangement as consecutive assessment of joint areas overlook or recollect the learnings. As LSTMs take
can have the option to more readily catch the into account longer maintenance of the info state in
reliance between joint areas. For a similar the organization, they can proficiently deal with long
explanation, [22] use RNN for human posture successions and give great outcomes. [23] discusses
assessment. how LSTM can be utilized with CNN for human
The issue with RNNs anyway is that they can't action acknowledgment to accomplish high exactness.
protect long haul conditions. Now and then, late data
is adequate to direct the current undertaking, yet
there are situations when the hole between the
applicable data and the current assignment turns out
to be excessively huge.
In such cases RNNs fizzle as they can't interface this
data. With regards to yoga, if the delegate steps in a
yoga asana are too much, RNNs think that its hard to
monitor the starting advances which are required so
D. CONVOLUTIONAL NEURAL NETWORK (CNN)
as to anticipate the current errand. This issue is called
CNN is a kind of neural organization which is
as the long haul reliance issue.
broadly utilized in the PC vision space. It has end up
being exceptionally successful with the end goal that
it has become the go-to strategy for most picture
information. CNNs comprise of at least one
convolutional layer which is the main layer and is
mindful for include extraction from the picture.
CNNs perform include extraction utilizing
convolutional channels on the information and
examining a few pieces of the contribution at a given
I. LONG SHORT-TERM MEMORY (LSTM): time prior to sending the yield to the ensuing layer.
The convolutional layer, using convolutional
So as to manage the above long haul reliance issue, an
channels, creates what is known as an element map.
extraordinary kind of RNN exists which is called With the assistance of a pooling layer, the
LSTM. A LSTM is a renowned RNN that can dimensionality is decreased, which decreases the
undoubtedly recall data or then again information for preparation time and forestalls overfitting. The most
a considerable length of time timeframes which is its well-known pooling layer utilized is max pooling,

Volume 6, Issue 6, November-December-2020 | http://ijsrcseit.com


166
Deepak Kumar et al Int J Sci Res CSE & IT, November-December-2020; 6 (6) : 160-184

which takes the most extreme incentive in the A great deal of work has been done in the past in
pooling window. building frameworks that are mechanized or
semiautomated which help to investigate exercise
CNNs show an extraordinary guarantee in present and sports exercises, for example, swimming [24], ball
characterization assignments, in this way making it [25] and so on Patil et al. [26], proposed a framework
an exceptionally attractive decision. They can be for distinguishing yoga act contrasts between a
prepared on keypoints of joint areas of the human specialist and a professional utilizing speeded up
skeleton or can be prepared legitimately on the vigorous highlights (SURF) which employments data
pictures. [4] utilized CNN to distinguish human of picture shapes. Notwithstanding, depicting and
postures from 2D human exercise pictures and contrasting the stances nearly by utilizing just the
accomplished a precision of 83%. Then again, [18] shape data isn't adequate.
utilized CNN on OpenPose keypoints to arrange yoga A framework for yoga preparing has been proposed
presents and accomplished a precision of 78%. by Luo et al. [27] which comprises of inertial
Despite the fact that, the exactness isn't actually estimation units (IMUs) and tactors. Be that as it may,
similar as the dataset alongside the CNN engineering this can be awkward to the client and at the
and activities being ordered are extraordinary, [18] equivalent time influence the common yoga present.
shows how utilizing CNNs on OpenPose keypoints [28] introduced a framework for yoga present
merits investigating. location for six postures utilizing Adaboost classifier
and Kinect sensors and accomplished a precision of
94.8%. Notwithstanding, they have utilized a
profundity sensor based camera that may not be
consistently available to clients. Another framework
for yoga present adjustment utilizing Kinect has been
introduced by [29] which considers three yoga
presents, hero III, descending canine and tree present.
On account of keypoints, CNN removes highlights
Nonetheless, their outcomes are not very great, and
from 2D directions of the OpenPose keypoints
their exactness score is just 82.84%. The customary
utilizing the equivalent convolutional channel
strategy for skeletonization has now been supplanted
strategy clarified previously. In light of the channel
by profound learning-based techniques.
size, the convolutional channel slides to the
Deep learning is a promising space where a ton of
following arrangement of information. After the
exploration is being done, empowering us to dissect
convolution, an initiation work Redressed Linear
gigantic information in an adaptable way. When
Unit (ReLU) is commonly applied to add nonlinearity
contrasted with conventional AI models where
in the CNN, as the genuine world information is
highlight extraction and designing is an
generally nonlinear and the convolution activity
unquestionable requirement, profound learning kills
without anyone else is straight. Tanh and sigmoid are
the need to do as such by understanding complex
other enactment capacities, yet ReLU is generally
examples in the information and removing highlights
utilized due to its better exhibition.
all alone.

VI. SUMMARY OF CURRENT STATE OF THE


VII. 7. HYPOTHESIS
START
A profound learning model for arranging yoga
stances can be fabricated where the underlying

Volume 6, Issue 6, November-December-2020 | http://ijsrcseit.com


167
Deepak Kumar et al Int J Sci Res CSE & IT, November-December-2020; 6 (6) : 160-184

keypoint extraction of the human joint areas is B. Disarray framework


finished utilizing OpenPose. The model can fuse Disarray framework speaks to a lattice which clarifies
highlight extraction capacities of CNN alongside the precision of the model totally. There are four
setting maintenance capacities of LSTM to adequately significant terms with regards to estimating the
arrange yoga presents in prerecorded recordings and presentation of a model.
furthermore continuously [2]. This model can be
considered as a half and half model. ✓ True Positive: Predicted esteem and the genuine
We additionally plan to explore different avenues yield are both 1.
regarding essential CNN organizations and contrast ✓ True Negative: Predicted esteem and the
the exhibition and the half breed model. Kinect genuine yield are both 0.
sensors could be an approach to perform human ✓ False Positive: Predicted esteem is 1 yet the
posture assessment, however it accounts for extra genuine yield is 0.
gear and concentrated equipment and the exhibition ✓ False Negative: Predicted esteem is 0 yet the
isn't in every case great in diverse environmental genuine yield is 1.
factors. AI models, despite the fact that not broadly
utilized for human posture assessment, will be
investigated for correlation with the profound
learning models. The assessment of the yoga present
arrangement framework will be finished by utilizing
grouping scores, disarray network and assessments by
individuals. The framework will anticipate the yoga
present arrangement being performed by the client
continuously and we can inspect if the expectation
made by the framework is right. The outcomes will
likewise be contrasted with existing strategies.
Fig. 9. shows an essential disarray lattice for twofold
VIII. 8. EVALUATION METRICS grouping. The askew qualities speak to the examples
that are accurately characterized and along these
A. Classification Score: lines, we generally need the askew of the framework
Arrangement score alludes to what we typically to contain the most extreme worth. If there should
mean by exactness of the model. It very well may be arise an occurrence of a multiclass grouping, each
portrayed as the extent of number of forecasts that class speaks to one column what's more, segment of
were right to the complete information tests. the framework.

In the event of multiclass order, this measurement C. Model accuracy and model loss curve:
gives great outcomes when the quantity of tests in These bends are additionally alluded to as
each class are nearly the equivalent. expectations to absorb information and are generally
utilized for models that learn gradually after some
time, for instance, neural organizations. They speak
to the assessment on the preparing and approval
information which gives us a thought of how well the
model is learning and how well is it summing up. The

Volume 6, Issue 6, November-December-2020 | http://ijsrcseit.com


168
Deepak Kumar et al Int J Sci Res CSE & IT, November-December-2020; 6 (6) : 160-184

model misfortune bend speaks to a limiting score


(misfortune), which implies that a lower score brings
about better model execution. The model precision
bend speaks to a expanding score (exactness), which
implies that a higher score signifies better execution
of the model. A decent fitting model misfortune bend
is one in which the preparation and approval
misfortune decline also, arrive at a state of soundness
and have negligible hole between the last misfortune
esteems. On the other hand, a decent fitting model
exactness bend is one in which the preparation and
approval precision increment and become stable and
there is a base hole between the last exactness
esteems. Chen et al. [30] proposed a system for distinguishing
a yoga act utilizing a Kinect camera. Those assembled
IX. LITERATURE REVIEW an amount of 300 recordings of 12 yoga stances from
5 yoga pros with each present performed on five
AI strategies may maybe rely significantly upon events. At first, the closer view part is fragmented
heuristic human element extraction in everyday from the cut, and the star skeleton is used and
errands of location of social exercises. It is limited by acquired an exactness of 99.33%. The creators in [31]
human zone mindfulness. To talk this danger, proposed a stance location system utilizing a quality
creators have decided on a couple of techniques like Kinect camera with a goal of 640 X 480. They saw six
profound learning methods. These strategies could postures performed by five volunteers. They
consequently extricate explicit highlights during the separated 21000 casings from the Kinect camera by
reparing stage from crude sensor information, and utilizing a foundation deduction technique with a 74%
afterward low-level fleeting qualities with ignificant precision. Later the creators in [32] proposed another
level unique requests would be introduced. With posture discovery methodology utilizing Kinect
regards to the freed application from profound camera. Also, Wang et al. [33] proposed a position
learning approaches in fields like picture grouping, acknowledgment methodology using the Kinect
voice acknowledgment, preparing of regular language, camera. They isolated the human blueprint and
and some others, it has been developing into a novel utilized a learning vector quantization neural
examination way in design location and to move it to framework for five fundamental stances. The system
an area of human action identification. Table 2 shows accuracy was roughly 98 %. onetheless, these
a couple of the current AI alongside the exactness of outcomes have high precision rates, and they are seen
HAR. Here, we have referenced just the techniques as security prominent.
that give the best exactness for the most extreme Yao et al. [34] have proposed a human stance
number of subjects. For instance, on the off chance recognizable proof system using a disconnected RFID
that the subject tally is less, at that point the signal. This methodology sees the human postures
precision might be better. subordinate over the assessment of RSSI signal
models, which made while singular plays out the
posture in a RFID name group and a RFID gathering

Volume 6, Issue 6, November-December-2020 | http://ijsrcseit.com


169
Deepak Kumar et al Int J Sci Res CSE & IT, November-December-2020; 6 (6) : 160-184

mechanical assembly. An accuracy of 99 percent for Finally, traditional classifiers were utilized to check
12 stances was achieved through the structure. In different activities subordinate upon input radar
spite of its higher exactness rate, the RSSI signal is signals. In spite of the fact that the above models
climate subordinate; likewise, the foundation and could, overall, see human exercises, the overall
upkeep of this technique are modern; an immediate framework structure is tolerably perplexing. A
aftereffect of its various parts. In [35], the creators lightweight profound learning model is proposed by
have developed a savvy petition tangle that sees four [40] for HAR and sent it on Raspberry Pi3. This
represents that are seen during supplication. The example was made using a shallow RNN in mix with
tangle has a couple power identifying resistive strips LSTM, and its overall precision upon the WISDM
inside and perceives the region where the dataset got around 96%. Despite the fact that this
individual's body is crushing. The tangle apparent the model has high exactness with great plan, it was
stances assembled from 30 individuals with 100% simply evaluated on a solitary dataset, which has just
exactness. Regardless, such a procedure can't six activities, that doesn't exhibit that the suggested
recognize the body parts if they are not pushed on plan has extraordinary speculation capacity. In [41]
the tangle. An earlier variation [36] of this paper presented a profound learning configuration named
grasps comparative hardware, including three Inno-HAR by incorporating commencement NN
tomahawks (x, y, and z) hubs, for the security with RNN for movement grouping. The creators used
protected posture acknowledgment framework. separate convolution to dislodge the ordinary
Exploratory results achieved a high ordinary F1-score convolution, which achieved the target of reducing
of roughly 98% in various blends of measurements. model boundaries. The results showed an awesome
In any case, the outcomes rely upon only six postures effect. Notwithstanding, the model met scarcely,
assembled from four volunteers. causing a huge load of time to be squandered at
In [37], the creators utilized an inclination histogram, preparing [42].
and Fourier descriptors subject to centroid highlights Countless these datasets are assembled from the
are used. By then, Jain et al. [37] used two classifiers, sources, for instance, on the web accounts, films,
SVM also, K-NN, to see the activities of two open pictures, sports chronicles, etc. Some of them give
datasets. They utilized six inertial assessment units to rich imprint information; in any case, they need
construct the system. The creators used the Random human stance variety. Chen et al. [43], starting late
Forest classifier to portray the exercises. Finally, an observed that the pictures considered for remarks are
overall accuracy of 84.6% was refined. The creators of high type with enormous target things. For
in [38] proposed a classifier by coordinating the occurrence, in the MPII dataset, around 70% of the
profound CNN and LSTM for grouping hand photos includes person things with height in excess
movements of 5 developments with a F1 score of 0.93 of 250 pixels. Thusly, missing a ton of grouped
and 0.95 for the two classifiers independently. Lin et assortment in individual positions and target object
al. [39] presented another iterative CNN approach size in these datasets, they can't meet the first rate
with autocorrelation pre-preparing, as opposed to necessities of uses, for instance, direct assessment.
ordinary little scope Doppler preprocessing, which Barely any works have been done on yoga act
can absolutely arrange seven activities or five subjects. arrangement for applications, for example, self-
Furthermore, this framework used an iterative getting ready [44]. Anyway, these works incorporate
profound learning structure to normally characterize a yoga dataset with a less number of pictures or
and separate highlights. accounts and try not to consider the gigantic
collection of positions. In this way, they need theory

Volume 6, Issue 6, November-December-2020 | http://ijsrcseit.com


170
Deepak Kumar et al Int J Sci Res CSE & IT, November-December-2020; 6 (6) : 160-184

and are far from complex yoga present arrangement.


The creators in , applied posture arrangement for
traditional move and Yoga presents utilizing CNN
and SAE calculation. Nevertheless, they have
evaluated their outcomes on static pictures and not
on accounts. OpenPose is a continuous framework
introduced by the Perceptual Figuring Lab of
Carnegie Mellon University (CMU) to commonly
recognize a human body, hand, facial, and foot key
focuses on single pictures. It is critical with regards to
act acknowledgment and gives the human body joint
regions using convolutional neural frameworks
Figure 10: server-side classification strategy
(CNNs).
The creators in propose a clever structure prepared
In [45], the creators proposed a yoga act
for seeing six postures for learning Yoga that can
acknowledgment framework utilizing a 2-stage
catch up to 6 people all the while with intelligent
classifier (i.e., BackPropagation Artificial Neural
Kinect gadget. It is moreover consolidated with
Network (BP-ANN) and Fuzzy C-implies) along with
request voices to envision the rules and pictures
assessment measurements to direct the students. As a
about the postures. For acknowledgment, the
first classifier, they utilized BP-ANN to partition the
structure used the AdaBoost calculation. The info
yoga stances into various classifications, and as a
information was set up by an expert yoga guide and
second classifier fluffy Cmeans classifier to order the
acquired an exactness of 94.58%. The creators in
places of various kinds. To assess the technique, they
proposed an IoT-based system named as protection
utilized a wearable gadget with 11 IMUs and a stance
saving yoga present discovery structure, using the
information base with 11 subjects, roughly 211 a
Deep CNN (DCNN) for inferior quality sensor
great many information outlines with 1800 stance
pictures. They utilized WSN, which comprises of the
cases. 30% of data used to prepare, and the remainder
three hubs, sensor module, Wi-Fi module to associate
of the information utilized as test information, and
the worker. They gathered the information from 18
the outcome was 95.39% precision. There exist
volunteers on 26 distinctive yoga stances with the
numerous works prior to utilizing this 2-stage
video length of 20 seconds. From the video document,
arrangement by utilizing procedures like HMM, SVM,
they produced the pictures and applied DCNN
and Decision tree, and so on For example, Kang et al.
models on it. The F1 score of this technique in the
[46] applied DT for pose acknowledgment utilizing
wake of utilizing ten times cross-approval methods is
portrayal highlights utilizing the body skeleton
about 0.99 and 0.98 with 3-tomahawks and just a
model. Despite the fact that the registering cost isn't
single hub (y-pivot), individually. The worker side
more, the accuracy was tolerably low.Wu et al. [47]
strategy of their work is appeared in Figure 4. The
proposed three measures, specifically joint point, arm
three various casings in three bearings are
heading, and regular advancement type, which could
consolidated, and the profound CNN is applied to the
be used generally speaking to evaluate the lower arm
picture for characterization to perceive the stance in
and upper arm. The creators in proposed a
the given picture .
methodology for yoga pose examination using
present acknowledgment. As per this method at first

Volume 6, Issue 6, November-December-2020 | http://ijsrcseit.com


171
Deepak Kumar et al Int J Sci Res CSE & IT, November-December-2020; 6 (6) : 160-184

perceives a stance utilizing OpenPose also, a camera.


By then, it registers separation between the body
points between a teacher and the client. In case it is
greater than the given limit, the technique proposes The creators in proposed a technique to perceive the
the amendment of the part. With this suggestion, it is Yoga presents precisely utilizing profound learning
typical that people can practice Yoga wherever, technique. For the investigation, they considered the
including home. Thusly, everyone can rehearse Yoga, open dataset, which comprises of a bunch of six yoga
paying little heed to mature enough or prosperity. asanas from 15 volunteers (10 male, five female) with
For assessments, the creators applied the suggestion the norm camera. The creators utilized a mixture
to 4 particular circumstances, for instance, unique CNN (utilized for include extraction) alongside the
body sizes, different stature, different ages, and LSTM approach (utilized for fleeting forecasts) to
distinctive camera division, with three Yoga presents. perceive the stances of Yoga on recordings and
Prior to this work, the creators in, presented video accomplished a precision of 99% around. They
demonstrating and input. They passed on bearings or additionally thought about the work in ongoing with
self-checked contributions to understudies to 12 various people (five guys, seven females) and
improve conduct changes. The inspiration driving acquired almost 98% precision. The central issues are
this assessment was to review video self-evaluations. distinguished by the OpenPose are indicated
The exactness was dictated by apportioning the graphically in Figure 5, and similar basic focuses used
quantity of right walks through the general number in the human body are spoken to in Table 4.
of ventures by investigating the errand. The creators
in , proposed a profound neural organization,
incorporates CNN with LSTM approach. The
highlights are extricated naturally, and order was
performed with two or three model boundaries.
LSTM is a variety of the RNN, that is progressively
proper for handling fleeting groupings. Concurring to
this strategy, the information is gathered from
sensors and will be given to the 2-layer LSTM,
alongside convolutional layers. Worldwide Average
Pooling layer and Batch Normalization likewise
applied to accelerate the cycle and to accomplish
better results. To exhibit the generalizability and
ampleness of their work, they used three public
datasets (UC-HAR, WISDM, and OPPORTUNITY),
and a rundown of these datasets is appeared in Table
3. In expansion to the exactness, they additionally
considered the F1 score for assessment of the model
what's more, accomplished roughly 96%, 96%, and Figure 11. Key point in the human body detection by
93% for these three datasets, separately. openPose
The current datasets for present acknowledgment
worse as far as variety, impediment, and perspectives.
As such, present acknowledgment restricted on less

Volume 6, Issue 6, November-December-2020 | http://ijsrcseit.com


172
Deepak Kumar et al Int J Sci Res CSE & IT, November-December-2020; 6 (6) : 160-184

number of postures. The creators in [48], proposed a examinations their change after some time. At long
thought of fine-grained present characterization, in last, the model and preparing strategy for framewise
which they figured the act acknowledgment like not expectation and surveying approach on 45 edges (1.5
as a basic grouping, but rather it is fine-grained. The s) of yield are examined.
creators moreover proposed a dataset named Yoga-82,
for enormous scope grouping with 82 classes. Not at 10.1 Pose Extraction .
all like with different datasets, Yoga-82 planned This is the first step of our pipeline and the OpenPose
expressly for Yoga presents with various varieties of library is used for it. On account of recorded
the stances. recordings, this progression happens offline, though
for continuous forecasts, it happens web based
utilizing contribution from the camera to flexibly
keypoints to the proposed model. OpenPose is an
opensource library for multi-individual keypoint
location, which recognizes the human body, hand,
and facial keypoints together [49]. The places of 18
keypoints followed by the OpenPose, for example
ears, eyes, nose, neck, shoulders, hips, knees, lower
The examination of this dataset with different legs, elbows, and wrists are shown in Fig. 1. The yield
datasets is appeared in Table 5. The yoga-82 dataset comparing to each casing of a video is gotten in JSON
contains three levels, including body positions, design which contains each body part areas for each
varieties, and names of the stances. The commitments individual identified in the picture. The posture
of their work incorporate a fine-grained extraction was performed at the default goal of
characterization with a huge scale dataset and OpenPose network for ideal execution. The
assessed the exhibition with CNN. They additionally framework worked at around 3 FPS at these settings.
adjusted the engineering of Dense Net to utilize the Figure 2 shows the proposed framework design
chain of importance of the dataset and to acquire the where OpenPose is utilized for keypoint extraction
exact present acknowledgment results. followed by the CNN and LSTM model to anticipate
the client's asanas. We utilized recordings with
X. Research Methodology particular subjects for preparing, test, and approval
sets with a 60:20:20 split at the video level . After the
Our methodology expects to naturally perceive the preprocessing, we get around 8000, 2500, and 2300
client's Yoga asanas from ongoing and recorded casings for preparing testing and approval cases,
recordings. The technique can be decayed into four separately. This strayed from 60:20:20 at the video
fundamental advances. To begin with, information level because of the variety long of the recordings.
assortment is performed which can either be an
ongoing cycle running in corresponding with
identification or can be recently recorded recordings.
Second, OpenPose is utilized to distinguish the joint
areas utilizing Part Confidence Maps and Part
Affinity Fields followed by bipartite coordinating and
parsing. The identified keypoints are passed to our
model where CNN finds examples and LSTM

Volume 6, Issue 6, November-December-2020 | http://ijsrcseit.com


173
Deepak Kumar et al Int J Sci Res CSE & IT, November-December-2020; 6 (6) : 160-184

applied on every one of the 45 casings, is then


flattened and passed to LSTM layer with 20 units and
unit overlook inclination of 0.5. LSTM is utilized to
distinguish transient changes in the highlights
extricated by the CNN layer. This use the consecutive
idea of video input, and the whole Yoga beginning
from its development to holding and delivery is
treated as an action. The yield of the LSTM layer
comparing to each edge is relaxed circulated
completely associated layer with Softmax initiation
Fig 12. System architecture: openPose followed by and six yields. Every one of these six yields gives the
CNN and LSTM model likelihood of the comparing Yoga regarding cross-
entropy. Thresholding is applied to this yield to
10.2 Model recognize when the client isn't performing Yoga.
The profound learning model utilized here is a mix of Despite the fact that the model uses LSTM for
CNN and LSTM (Fig. 2). CNN is generally utilized for catching transient relationship, the outcomes are
design acknowledgment issues and LSTM is utilized accommodated each edge in the succession and
for timeseries errands. In our work, the time- afterward surveyed for the whole grouping of 45
circulated CNN layer is utilized to remove highlights casings. This is additionally expounded in the
from the 2-D directions of the keypoints acquired in outcomes area.
the past advance. The LSTM layer examinations the
adjustment in these highlights over the casings, and 10.3 Training
the likelihood of every Yoga in an edge is given by Our assignment is to perceive the client's asanas with
the Softmax layer. Thresholding is performed on this appropriate exactness progressively. To begin with,
incentive to distinguish outlines where the client keypoint highlights are removed utilizing OpenPose
isn't performing Yoga and the impact of surveying on and recorded the joint area esteems in the JSON file,
45 edges has been examined. The model has been and afterward CNN and LSTM models are applied for
modified utilizing Keras Sequential API in Python. the expectation of asanas. Because of the mix of both,
The information case has a state of 45 9 18 9 2 which we get the best arrangement of highlights filtered by
indicates the 45 successive edges with 18 keypoints CNN and long haul information conditions set up
having X and Y organizes each. Time- utilizing LSTM. The model is aggregated utilizing
distributedCNNlayerwith16filtersofsize3 9 Keras with Theano backend. The clear cut cross-
3havingReLU enactment is applied to keypoints of entropy misfortune work is utilized in view of its
each edge for include extraction. CNNs have a solid appropriateness to gauge the exhibition of the
capacity to remove spatial highlights which are scale completely associated layer's yield with Softmax
and pivot invariant. The CNN layer can extricate initiation. Adam streamlining agent with an
spatial highlights like relative separation and points underlying learning pace of 0.0001, a beta of 0.9 and
between the different keypoints in an edge. Cluster no rot are utilized to control the learning rate.
standardization is applied to the CNN yield for
quicker combination. This is trailed by a dropout
layer which haphazardly drops a small amount of the
loads forestalling overfitting. The yield from CNN,

Volume 6, Issue 6, November-December-2020 | http://ijsrcseit.com


174
Deepak Kumar et al Int J Sci Res CSE & IT, November-December-2020; 6 (6) : 160-184

Fig. 13 model loss over the epochs for framewise


approach XI. Data Analysis

The model has been prepared for 100 ages on a 11.1 Dataset
framework with Intel i7 6700HQ processor, 16GB The dataset utilized for this undertaking is a piece of
RAM, and Nvidia GTX 960M GPU. The preparation the Open Source assortment and is freely accessible
takes around 22 s for every age which is moderately [50]. This dataset has been made by [2]. It comprises
speedy because of the basic information sources and of recordings of 6 yoga presents performed by 15
minimal plan. Figures 3 and 4 show the adjustment in distinct people (5 females and 10 guys). The 6 yoga
precision and misfortune work, individually, presents specifically are – Bhujangasana (Cobra
throughout preparing. At first, the preparation and present), Padmasana (Lotus present), Shavasana
approval exactnesses increment quickly with (Corpse present), Tadasana (Mountain present),
approval precision remaining over the preparation Trikonasana (Triangle posture) and Vrikshasana (Tree
exactness showing a decent speculation. Afterward, present). The all out number of recordings is 88 with
the development is steady, and combination happens a term of 1 hour 6 minutes and 5 seconds. The rate at
after 20 ages. The precision and misfortune way to which the recordings have been recorded is 30 FPS
deal with their asymptotic qualities after 40 ages with (outlines every second). All the recordings have been
minor commotion in the middle. The loads of the recorded in an indoor climate at a separation of 4
best fitting model with most noteworthy approval meters from the camera. All people have performed
precision are protected for additional testing. Both, yoga presents with varieties to help fabricate a dataset
preparing just as approval misfortune have which can be utilized to assemble a strong yoga
diminished consistently and combined demonstrating present acknowledgment framework. The normal
a well-fitting model. length of all recordings is around 45 to 60 seconds.
Fig. 10 speaks to the quantity of recordings for every
yoga present performed by the quantity of people [2].

Volume 6, Issue 6, November-December-2020 | http://ijsrcseit.com


175
Deepak Kumar et al Int J Sci Res CSE & IT, November-December-2020; 6 (6) : 160-184

Table 6. Dataset details camera are provided to the model. OpenPose is run
on each casing of the video and the comparing yield
of each edge is put away in JSON design. This JSON
information incorporates the areas of body portions
of every individual distinguished in the video outline.
Default setting of OpenPose has been utilized for
removing present keypoints for ideal execution. Fig.
12 portrays the 18 keypoint positions caught by
OpenPose [2].

The edges from recordings of people performing


distinctive yoga presents is appeared in table.

11. Recordings with various subjects have been


utilized for the train, test and approval sets.

Fig 15. Keypoints identified by OpenPose

The JSON information is gotten and put away in


numpy clusters in arrangements of 45 edges which is
about 1.5 seconds of the video [2]. 60% of the dataset
has been utilized for preparing, 20% for testing
what's more, 20% for approval. The preparation
information has 7989 successions of 45 casings, each
containing the 2D directions of the 18 keypoints
Fig 14. Dataset Frames caught by OpenPose. The approval information
comprises of 2224 such arrangements and the test
11.2 Information Preprocessimg information contains 2598 groupings. The quantity of
The initial phase in preprocessing the information is casings fluctuated from 60,20,20 split at the video
removing keypoints of stances in video outlines level. This was a direct result of the distinction in
utilizing the OpenPose library. For recorded span of recordings. Fig. 13 shows the keypoints
recordings, present extraction is done disconnected recognized by OpenPose on each of the 6 yoga
while no doubt time, it is done online wherein presents [2].
keypoints distinguished from the contributions to the

Volume 6, Issue 6, November-December-2020 | http://ijsrcseit.com


176
Deepak Kumar et al Int J Sci Res CSE & IT, November-December-2020; 6 (6) : 160-184

SVM is an administered AI model that is naturally a


two-class classifier. In any case, as most issues include
different classes, a multiclass SVM is regularly
utilized. A multiclass SVM structures various two
class classifiers and separates the classifiers based on
the unmistakable name versus the rest (one-versus
rest or one-versus all) or between each pair of classes
(one-versus one). SVM plays out the grouping by
making a hyperplane so that division between classes
is as wide as could be expected under the
circumstances.
A default SVM has been prepared on the preparation
information with the outspread premise work (rbf)
portion. Rbf is the default and most mainstream bit
which is a gaussian spiral premise work. It gives
greater adaptability when contrasted with different
parts, direct and polynomial. The estimation of the
delicate edge boundary C is 1 and the choice capacity
is one-versus rest. The keypoints caught utilizing
OpenPose are utilized as highlights to SVM. These 18
keypoints are spoken to by X and Y organizes which
makes the absolute number of highlights as 36 (18 *
2). The information is reshaped to make the quantity
of tests equivalent.
A. Results:
Train precision: 0.9953
Validation exactness: 0.9762
Test precision: 0.9319

Fig 16. OpenPose on different pose

XII. Interpretation And Recommendation

12.1 Model Perforamence and Result


Preparing Setup:
The models are manufactured utilizing Python
libraries, for example, TensorFlow - Keras (Theano
backend), OpenPose, NumPy, Scikit Learn on a
framework with NVIDIA Tesla 1080 GPU having 4
GB memory.
1. Backing Vector Machine (SVM)

Volume 6, Issue 6, November-December-2020 | http://ijsrcseit.com


177
Deepak Kumar et al Int J Sci Res CSE & IT, November-December-2020; 6 (6) : 160-184

with softmax actuation and 6 units where each unit


speaks to the probability of a yoga present in cross
entropy terms for each of the 6 classes. The model
design rundown is appeared in Fig. 14.

Fig 17. Model Layers


B. Analysis
The preparation precision of the model is pretty high The misfortune work utilized for gathering the model
at 0.99. There is a slight abatement in the approval is absolute cross - entropy which is moreover called
and test precision, yet the outcomes are still softmax misfortune. This is utilized as it permits
acceptable. We can find in the disarray network that estimating the exhibition of the yield of the thickly
most classes are grouped effectively aside from associated layer with softmax initiation. This
tadasana (mountain present). Out of 17,685 edges for misfortune work is utilized for multi class
tadasana, 6992 have been misclassified as vrikshasana arrangement, and as we have numerous yoga present
(tree present) and comparatively there is a few classes, it bodes well to utilize downright cross
inaccurate arrangement for vrikshasana. This could entropy. At long last, to deal with the learning rate,
be a direct result of the closeness in the postures as adam enhancer with an underlying learning pace of
both of them require a standing position and 0.0001 is utilized. The all out number of ages for
furthermore the underlying posture arrangement is which the model is prepared is 100.
comparative.
A. Results:
2. Convolutional Neural Network Train exactness: 0.9878
A one dimensional, one-layer CNN with 16 channels Validation precision: 0.9921
of size 3 x 3 is prepared on the OpenPose keypoints. Test precision: 0.9858
The info shape is 18 x 2 which connotes the 18
keypoints having X and Y arranges. Bunch
standardization is applied to the yield of the CNN
layer with the goal that the model merges quicker.
We likewise have a dropout layer that forestalls
overfitting by arbitrarily dropping some division of
the loads. The actuation work utilized is Rectified
Linear Unit (ReLU) which is applied for highlight
extraction on keypoints of each edge. The last yield is
smoothed prior to being passed to the thick layer

Volume 6, Issue 6, November-December-2020 | http://ijsrcseit.com


178
Deepak Kumar et al Int J Sci Res CSE & IT, November-December-2020; 6 (6) : 160-184

into a one dimensional vector and given as


contribution to the LSTM layer which has 20 units,
every unit having an overlook inclination of 0.5. The
worldly changes in the characteristics distinguished
by the CNN are recognized by LSTM which makes a
difference influence the consecutive idea of the
information video information, in this manner
treating the whole yoga present start from its
commencement to act changes and delivery like a
total movement. The remainder of the CNN
engineering is equivalent to show 2. A diagrammatic
B. Analysis
portrayal of the model is appeared in Fig. 15 [2].
The preparation, approval and test precision of the
model are nearly the equivalent, around 0.99. The
disarray lattice further shows that the model works
admirably of grouping all tests accurately, aside from
certain examples in vrikshasana which are
misclassified as tadasana, prompting 93% precision
for vrikshasana. When contrasted with SVM, there
are less misclassifications in CNN. Notwithstanding,
Fig 18. Model Architecture Diagram
the model misfortune bend above shows an
expansion in the approval misfortune, and a
diminishing in the preparation misfortune which
shows that there is some overfitting.

3. Convolutional Neural Network + Long Short Term


Memory
A profound learning model, CNN and LSTM is
utilized [2]. CNN is utilized for investigating outlines
furthermore, distinguishing designs, while LSTM
makes expectations on the transient information. The
CNN layer removes highlights from the keypoints
and passes it to the LSTM cells which look at varieties A. Results:
in the traits over various casings. The state of the Train exactness: 0.9992
CNN input is 45 x 18 x 2 which speaks to 45 edges, 18 Validation precision: 0.9987
keypoints in each edge and each keypoint having 2 Test exactness: 0.9938
directions: X and Y. The convolution layers are time
disseminated which sends the yield from CNN more
than 45 edges (1.5 seconds) of information to the
LSTM as a succession. The time dispersed layer is
gainful for activities with developments and is
subsequently utilized. The CNN yield is smoothed

Volume 6, Issue 6, November-December-2020 | http://ijsrcseit.com


179
Deepak Kumar et al Int J Sci Res CSE & IT, November-December-2020; 6 (6) : 160-184

presents impeccably. A fundamental CNN and SVM


likewise perform well past our desires.
Execution of SVM demonstrates that ML calculations
can likewise be utilized for present assessment or
movement acknowledgment issues. Additionally,
SVM is a lot lighter and less unpredictable when
contrasted with a neural network and requires less
preparing time.

XIV. FUTURE WORK

The proposed models right now characterize just 6


yoga asanas. There are various yoga asanas, and
B. Analysis
subsequently making a posture assessment model that
The grouping scores are practically near 1 along these
can be effective for all the asanas is a testing issue.
lines indicating ideal arrangement for all classes. The
The dataset can be extended my adding more yoga
slanting in the standardized disarray network is 1.0
presents performed by people in indoor setting as
for all classes aside from 0.99 for vrikshasana. The
well as open air. The exhibition of the models
quantity of misclassifications for vrikshasana is just
depends upon the nature of OpenPose present
177 which is significantly less as contrasted with the
assessment which may not perform well in instances
past two models. Additionally, the model precision
of cover between individuals or cover between body
and model misfortune bend show a decent fit without
parts. A convenient gadget for self-preparing and
any changes.
constant forecasts can be executed for this framework.
This work exhibits movement acknowledgment for
XIII. CONCLUSION
reasonable applications. A methodology practically
identical to this can be used for present
Human posture assessment has been concentrated
acknowledgment in undertakings for example, sports,
widely over the previous years. When contrasted
reconnaissance, medical services and so forth Multi-
with other PC vision issues, human posture
individual posture assessment is a totally different
assessment is distinctive as it needs to limit and amass
issue in itself and has a great deal of degree for
human body parts based on an effectively
research. There are a ton of situations where single
characterized structure of the human body. Use of
individual posture assessment would not get the job
posture assessment in wellness and sports can help
done, for instance present assessment in jam-packed
forestall wounds and improve the execution of
situations would have various people which will
individuals' exercise. [3] recommends, yoga self-
include following and distinguishing posture of every
guidance frameworks convey the potential to make
person. A great deal of factors, for example,
yoga famous alongside ensuring it is acted in the
foundation, lighting, covering figures and so on
correct way. Profound learning techniques are
which have been talked about before in this overview
promising a result of the huge exploration being done
would additionally make multi-individual posture
in this field. The utilization of mixture CNN and
assessment testing.
LSTM model on OpenPose information apparently is
profoundly successful and arranges all the 6 yoga

Volume 6, Issue 6, November-December-2020 | http://ijsrcseit.com


180
Deepak Kumar et al Int J Sci Res CSE & IT, November-December-2020; 6 (6) : 160-184

information assortment and ongoing testing. The


framework effectively identifies Yoga presents in a
video with 99.04% exactness for framewise and 99.38%
exactness in the wake of surveying of 45 edges. The
framework accomplished 98.92% precision
progressively for a bunch of 12 distinct individuals
demonstrating its capacity to perform well with new
subjects and conditions. It must be noticed that our
methodology annihilates the requirement for Kinect
or some other specific equipment for Yoga pose
identification and can be actualized on contribution
from an ordinary RGB camera. In future work, more
asanas and a bigger dataset involving both picture
and recordings can be incorporated. Additionally, the
framework can be executed on a convenient gadget
for continuous expectations and self-preparing. This
work fills in as an exhibit of action acknowledgment
Fig 19. Predictions of asanas in real time (top to
bottom row): Bhujangasana (column 2 has the wrong
prediction), Padmasana, Shavasana, Trikonasana, and
Vrikshasana

Fig 20. Continued

The methodology of utilizing CNN and LSTM on


present information got from OpenPose for Yoga
pose identification has been discovered to be
exceptionally compelling. The framework perceives
the six asanas on recorded recordings just as
progressively for 12 people (five guys and seven
females). Various people have been utilized for

Volume 6, Issue 6, November-December-2020 | http://ijsrcseit.com


181
Deepak Kumar et al Int J Sci Res CSE & IT, November-December-2020; 6 (6) : 160-184

algorithm”,Proc. IEEE Control Syst. Graduate


Research Colloq.,pp. 43-46, 2011.
[7]. W. Gong, X. Zhang, J. Gonzàlez, A. Sobral, T.
Bouwmans, C. Tu, and H. Zahzah, “Human
pose estimation from monocular images: a
comprehensive survey”, Sensors, Basel,
Switzerland, vol. 16, 2016.
[8]. G. Ning, P. Liu, X. Fan and C. Zhan, “A top-
down approach to articulated human pose
estimation and tracking”, ECCV Workshops,
2018.
Fig 20. Predictions of asanas on recorded videos (top
[9]. A. Gupta, T. Chen, F. Chen, and D. Kimber,
to bottom row): Bhujangasana, Padmasana, Shavasana,
“Systems and methods for human body pose
Tadasana (column 2 has a wrongly predicted
estimation”, U.S. patent, 7,925,081 B2, 2011.
sequence), Trikonasana, and Vrikshasana (Column 1
[10]. H. Sidenbladh, M. Black, and D. Fleet,
has a wrongly predicted sequence)
“Stochastic tracking of 3D human figures using
2D image motion”, Proc 6th European Conf.
XV.REFERENCES
Computer Vision, 2000.
[11]. A. Agarwal and B. Triggs, “3D human pose
[1]. L. Sigal. “Human pose estimation”, Ency. of
from silhouettes by relevance vector
Comput. Vision, Springer 2011.
regression”, Intl Conf. on Computer Vision &
[2]. S. Yadav, A. Singh, A. Gupta, and J. Raheja,
Pattern Recogn.pp.882–888, 2004.
“Real-time yoga recognition using deep
[12]. M. Li, Z. Zhou, J. Li and X. Liu, “Bottom-up
learning”, Neural Comput. and Appl., May
pose estimation of multiple person with
2019.Online]. Available:
bounding box constraint”, 24th Intl. Conf.
https://doi.org/10.1007/s00521-019-04232-7
Pattern Recogn.,2018.
[3]. U. Rafi, B. Leibe, J.Gall, and I. Kostrikov, “An
[13]. Z. Cao, T. Simon, S. Wei, and Y. Sheikh,
efficient convolutional network for human
“OpenPose: realtime multi-person 2D pose
pose estimation”, British Mach. Vision Conf.,
estimation using part affinity fields”, Proc. 30th
2016.
IEEE Conf. Computer Vision and Pattern
[4]. S. Haque, A. Rabby, M. Laboni, N. Neehal, and
Recogn,2017.
S. Hossain, “ExNET: deep neural network for
[14]. A. Kendall, M. Grimes, R. Cipolla, “PoseNet: a
exercise pose detection”, Recent Trends in
convolutional network for real-time 6DOF
Image Process. and Pattern Recog., 2019.
camera relocalization”, IEEE Intl. Conf.
[5]. M. Islam, H. Mahmud, F. Ashraf, I. Hossain
Computer Vision, 2015.
and M. Hasan, "Yoga posture recognition by
[15]. S. Kreiss, L. Bertoni, and A. Alahi, “PifPaf:
detecting human joint points in real time using
composite fields for human pose estimation”,
microsoft kinect", IEEE Region 10 Humanit.
IEEE Conf. Computer Vision and Pattern
Tech. Conf., pp. 668-67, 2017.
Recogn, 2019.
[6]. S. Patil, A. Pawar, and A. Peshave, “Yoga tutor:
[16]. P. Dar, “AI guardman – a machine learning
visualization and analysis using SURF
application that uses pose estimation to detect
shoplifters”.Online]. Available:

Volume 6, Issue 6, November-December-2020 | http://ijsrcseit.com


182
Deepak Kumar et al Int J Sci Res CSE & IT, November-December-2020; 6 (6) : 160-184

https://www.analyticsvidhya.com/blog/2018/06 [27]. W. Wu, W. Yin, F. Guo, “Learning and self-


/ai-guardman-machine-learning application- instruction expert system for yoga”, Proc. Intl.
estimates-poses-detect-shoplifters/ Work Intelligent Syst. Appl, 2010.
[17]. D. Mehta, O. Sotnychenko, F. Mueller and W. [28]. E. Trejo, P. Yuan, “Recognition of yoga poses
Xu, “XNect: real-time multi-person 3D human through an interactive system with kinect
pose estimation with a single RGB camera”, device”, Intl. Conf. Robotics and Automation
ECCV, 2019. Science, 2018.
[18]. A. Lai, B. Reddy and B. Vlijmen, “Yog.ai: deep [29]. H. Chen, Y. He, C. Chou, “Computer assisted
learning for yoga”.Online]. Available: self-training system for sports exercise using
http://cs230.stanford.edu/projects_winter_2019 kinetics”, IEEE Intl. Conf. Multimedia and
/reports/15813480.pdf Expo Work, 2013.
[19]. M. Dantone, J. Gall, C. Leistner, “Human pose [30]. DatasetOnline]. Available:
estimation using body parts dependent joint https://archive.org/details/YogaVidCollected.
regressors”, Proc. IEEE Conf. Computer Vision [31]. Y. Shavit, R. Ferens, “Introduction to camera
Pattern Recogn., 2013. pose estimation with deep learning”,Online].
[20]. A. Mohanty, A. Ahmed, T. Goswami, “Robust Available: https://arxiv.org/pdf/1907.05272.pdf.
pose recognition using deep learning”, Adv. in [32]. Gao Z, Zhang H, Liu AA et al (2016) Human
Intelligent Syst. and Comput, Singapore, pp 93- action recognition on depth dataset. Neural
105, 2017. Comput Appl 27:2047–2054. https://doi.
[21]. P. Szczuko, “Deep neural networks for human org/10.1007/s00521-015-2002-0
pose estimation from a very low resolution [33]. Poppe R (2010) A survey on vision-based
depth image”, Multimedia Tools and Appl, human action recognition. Image Vis Comput
2019. 28:976–990. https://doi.org/10.1016/j.
[22]. M. Chen, M. Low, “Recurrent human pose imavis.2009.11.014
estimation”,Online]. Available: [34]. Weinland D, Ronfard R, Boyer E (2011) A
https://web.stanford.edu/class/cs231a/prev_proj survey of visionbased methods for action
ects_2016/final%20(1).pdf representation, segmentation and recognition.
[23]. K. Pothanaicker, “Human action recognition Comput Vis Image Underst 115:224–241.
using CNN and LSTM-RNN with attention https://doi. org/10.1016/j.cviu.2010.10.002
model”, Intl Journal of Innovative Tech. and [35]. Halliwell E, Dawson K, Burkey S (2019) A
Exploring Eng, 2019. randomized experimental evaluation of a yoga-
[24]. N. Nordsborg, H. Espinosa, “Estimating energy based body image intervention. Body Image
expenditure during front crawl swimming 28:119–127.
using accelerometrics”, Procedia Eng., 2014. https://doi.org/10.1016/j.bodyim.2018.12. 005
[25]. P. Pai, L. Changliao, K. Lin, “Analyzing [36]. Patil S, Pawar A, Peshave A et al (2011) Yoga
basketball games by support vector machines tutor: visualization and analysis using SURF
with decision tree model”, Neural Comput. algorithm. In: Proceedings of 2011 IEEE
Appl., 2017. control system graduate research colloquium,
[26]. S. Patil, A. Pawar, A. Peshave, “Yoga tutor: ICSGRC 2011, pp 43–46
visualization and analysis using SURF [37]. Chen HT, He YZ, Hsu CC et al (2014) Yoga
algorithm”, Proc. IEEE Control Syst. Grad. posture recognition for self-training. In:
Research Colloquium, 2011. Lecture notes in computer science (including

Volume 6, Issue 6, November-December-2020 | http://ijsrcseit.com


183
Deepak Kumar et al Int J Sci Res CSE & IT, November-December-2020; 6 (6) : 160-184

subseries lecture notes in artificial intelligence Authors


and lecture notes in bioinformatics), pp 496–
505 Deepak Kumar
[38]. Schure MB, Christopher J, Christopher S (2008) Department of Information
Mind–body medicine and the art of self-care: Technology, Research Scholar,
teaching mindfulness to counseling students Amity University, Jharkhand,
through yoga, meditation, and qigong. J Couns India
Dev. https://doi.org/10.1002/j.1556-
6678.2008.tb00625.x
[39]. Chen HT, He YZ, Hsu CC (2018) Computer- Anurag Sinha
assisted yoga training system. Multimed Tools Department of Information
Appl 77:23969–23991. https:// Technology, Research Scholar,
doi.org/10.1007/s11042-018-5721-2 Amity University, Jharkhand,
[40]. Maanijou R, Mirroshandel SA (2019) India
Introducing an expert system for prediction of
soccer player ranking using ensemble learning.
Neural Comput Appl. Cite this article as :
https://doi.org/10.1007/s00521019-04036-9
[41]. Nordsborg NB, Espinosa HG, Thiel DV (2014) Deepak Kumar, Anurag Sinha , "Yoga Pose Detection
Estimating energy expenditure during front and Classification Using Deep Learning ",
crawl swimming using accelerometers. International Journal of Scientific Research in
Procedia Eng 72:132–137. Computer Science, Engineering and Information
https://doi.org/10.1016/j.proeng.2014. 06.024 Technology (IJSRCSEIT), ISSN : 2456-3307, Volume
[42]. Connaghan D, Kelly P, O’Connor NE et al 6 Issue 6, pp. 160-184, November-December 2020.
(2011) Multi-sensor classification of tennis Available at
strokes. Proc IEEE Sens. https://doi.org/ doi : https://doi.org/10.32628/CSEIT206623
10.1109/icsens.2011.6127084 Journal URL : http://ijsrcseit.com/CSEIT206623
[43]. Shan CZ, Su E, Ming L (2015) Investigation of
upper limb movement during badminton
smash. In: 2015 10th Asian Control conference,
pp 1–6.
https://doi.org/10.1109/ascc.2015.7244605 .

Volume 6, Issue 6, November-December-2020 | http://ijsrcseit.com


184

You might also like