0% found this document useful (0 votes)

22 views24 pages

Synopsis KRG

Uploaded by

pooja pande

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views24 pages

Synopsis KRG

Uploaded by

pooja pande

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 24

Synopsis

on
EmoRecs: Personalized Activity Recommendation System
Based on Multimodal Emotion Detection using facial
expression and voice modulation.

Submitted
towards
Pre-registration seminar
For
Doctor of Philosophy
Under the discipline of
Computer Science and Technology
In the faculty of
Science and Technology
of
RASHTRASANT TUKADOJI MAHARAJ NAGPUR UNIVERSITY
Under the Supervision of
Dr. Ujwalla Gawande
Associate Professor
Department of Information Technology, YCCE, Nagpur

At
Research Center
YESHWANTRAO CHAVAN COLLEGE OF ENGINEERING, NAGPUR
Abstract

Suicide due to depression is one of the leading causes of death. Humans have a unique ability
to demonstrate and understand emotions through a variety of models of communication.
Based on their emotions or mood swings we can judge whether the human subject is in good
psychological condition or not. The most visible apparent deficiencies of today’s Emotion
capturing systems were their inability to understand the emotions of such patients like mental
health disorder, social emotion by using facial expressions. It can be used in schools to help
students who find it difficult to express their feelings or who have unstable mental health
concerns, such as depression, and hence the teacher’s or health workers can communicate
with their parents and work through their problems. These days, technology allows employers
to recognize individuals who are overly stressed in the workplace and release them from their
duties [2]. Emotion detection using speech and face in deep learning has made significant
progress in recent years, but there are still several challenges that need to be addressed such
as limited dataset, variability in data, feature extraction, Ethical and privacy concern. This
requires careful consideration of ethical and privacy issues in the design and deployment of
deep learning models for emotion detection. There are many sources of emotion recognition
like face expressions, hands, body language, text, speech etc[1]. Applying any one of these
techniques will not reflect much to find emotions appropriately, as human emotions can
change every single second or less than that.

Modality is the word that refers to different modes of recognition. In early researches it has
been seen that single modality more over taken into considerations for emotion recognition
which always not give better results due to insufficient information. Multi-model
functionality helps in overall recognition process for increasing reliability. Fusing multiple
feature sets and classifiers into one system will produce a comparably more accurate system.
This research works includes combination of speech and facial expressions, as this hybrid
mode will help in evaluating results perfectly and as per desire. The proposed methodology
consists of two main parts; first is Facial Expression Recognizer (FER) is utilized in various
fields, such as education, gaming, robotics, healthcare, and others. Speech Emotion
Recognition (SER) is a second part which has numerous uses in industries like psychology,
entertainment, and healthcare and is a critical component of human-computer interaction.
And for the final result hybrid mode will work that consider output of both FER and SER.
This is multi-sensory and multi-model emotion recognition system. Hence, AI or ML can be
used to detect emotion of a person and provide an appropriate recommendation system is a
need.

Keywords: Emotion Recognition, Social emotional, Facial Expression Recognition, Speech

emotion recognition.
Introduction

Human communication has two main aspects verbal (auditory) and nonverbal (visual). In
human-to-human communication of emotion, people use either one or both modality. The
verbal statement can be seen as reflection of emotion in the facial expression. Visible
communication is particularly effective when the auditory speech is degraded because of
noise bandwidth, filtering or hearing impairment. Facial expression analysis has become a
promising research area that finds potential applications in human-computer interfaces,
talking heads, image retrieval, human emotion analysis etc. [1] Facial expression
communicates emotions, pain etc. and regulates interpersonal behavior. Anger, Disgust, Fear,
Happiness, Sadness and Surprise are considered as "basic emotions" and each have a
universally recognised facial expression, involving changes in multiple facial regions, which
facilitates analysis. Speech is one of the most efficient communication media when people
make a communication with other person. The field of speech signal has been of great
interest to scientists and engineers because of several exciting applications that make a
difference to our day-to-day life.

Emotion is a dynamic cognitive and physiological condition that develops in reaction to

inputs, like experiences, thoughts, or interactions with people. It includes subjective
experience, cognitive processes, behavioral influences, physiological responses, and
communication. Therefore, emotion recognition is crucial in the application areas such as
marketing, human–robot interaction, healthcare, mental health monitoring, and security [1].
Human emotions can play a key role in detecting physiological conditions like fatigue [4],
drowsiness, depression, and pain. The study of human emotion is also crucial for human–
robot interaction and brain-computer evaluation, where machines are designed to behave like
humans for various applications [1]. Therefore, a detailed study of human emotions and
automated human emotion recognition is crucial.

1.1. Paradigms of emotion

Distinct brain parts induce different emotions [12]. There are three types of emotional
responses: reactional, hormonal, and automatic [13]. According to psychology, emotions are
responses to stimuli, associated with qualitative physiological changes [13]. Two basic
approaches used to study the nature of emotions are discrete method and the
multidimensional approach [13].
A. Discrete emotions theory

According to this theory, emotions are different and discrete categories, each with its
ensemble of cognitive, psychological, and behavioral factors. Emotions can be positive or
negative. According to proponents of this hypothesis, there exist a few fundamental emotions
that are generally recognized across cultures. There are six basic emotions namely: happiness,
sadness, anger, surprise, fear, and disgust [14]. Robert Plutchik provided a comprehensive
emotional model called Plutchik’s wheel of emotions [15].

Fig. 1. Plutchik’s wheel of emotions [15].

Plutchik’s wheel consists of eight emotions namely: fear, joy, sadness, trust, anger, surprise,
anticipation, and disgust. Other associated emotions, which combines these eight primary
emotions are derived by positional intensities. The intensity of the emotions increases as we
move towards the center of the wheel and vice-versa. Fig. 1 provides an overview of
Plutchik’s wheel of emotions [15].
B. Multidimensional emotions theory

The multidimensional approach for emotions acknowledges that emotions are

complicated and impacted by numerous elements such as personal experiences, cultural
background, and individual variations. It gives a framework for comprehending the richness
and complexity of emotional experiences and allows for a more in-depth examination of
emotional states. It is categorized as a 2-dimensional (2D) and 3- dimensional (3D) emotional
space model. In the 2D emotional space model, emotions are divided into valence (V), which
can be positive (Pos) or negative (Neg) and arousal (A), i.e., high activation or low activation.
Russell’s 2D emotional space model that maps using valence and arousal is shown in Fig. 2
[16].

Fig. 2. 2D VA emotion model.

Similarly, the 3D emotional space model maps various continuous dimensions, such as V
(Pos or Neg), arousal (high or low activation), and dominance (D) (feeling in control or
feeling controlled). The 3D emotional space model proposed by Mehrabian and Russell is
shown in Fig. 3 [17].
Fig. 3. 3D VAD emotion model

Tools that can assist people in recognizing the emotions of those around them could be very
beneficial in treatment settings as well as in regular social encounters.

1.2. Emotion sensing modalities

Emotion sensing is a technique used to extract human emotions. Over the years, various
methods have been adopted to study human emotions. Physical signals for emotion
recognition include facial expressions, speech, text, gestures, and body postures [5]. Speech
and facial expressions are the most commonly employed mechanisms for emotion
identification among physical signals [5]. As a result, we chose to limit our review study to
only physical activities based on speech and facial expressions.

1.3 Application of human emotion recognition

Emotion recognition covers many applications, including brain-computer interfaces, robotics,

and healthcare. However, with the recent technological advancements and rise in electronic
gadget usage, emotion recognition can help to accelerate in various fields. Some of them are
listed below:

Detection and monitoring of medical conditions:

Patients in hospitals frequently exhibit psychological issues such as sadness, pessimism,
eccentricity, and anxiety. However, hospitals normally lack tools and facilities to
continuously monitor the psychological health of patients. It is desirable to identify
depression in patients so that it can be managed by instantly providing better therapy. This
can be possible by advances in machine learning for image processing with notable
applications in the domain of emotion recognition using facial expressions

Children health:
The study and analysis of emotions in children can also play a crucial role in their health
monitoring. Studies revealed that emotional development and regulation can be crucial in
children .Students in the school who find it difficult to express their feelings or who have
unstable mental health concerns, such as depression and hence the teacher’s or health workers
can communicate with their parents and work through their problems

Human-robot interactions:
The rise in AI has boosted the development of human-modeled machines. The applications of
human emotions have attracted researchers to investigate human-machine interfaces and
sentimental analysis. Human-machine interfaces can infer and understand human emotions,
making them more successful in human interactions; the models should be able to interpret
human emotions and adapt their behavior appropriately, resulting in an acceptable reaction to
those sentiments.

Patience assistance:
Emotion can be pivotal in patient monitoring and assistance. Effective analysis of emotion
can help to sense and detect loneliness, mood variations, and suicidal cues.

Marketing:
A camera with AI systems in shopping malls can be used to read the real-time emotions of
customers, which may be used for marketing.

These days, technology allows employers to recognize individuals who are overly stressed in
the workplace and release them from their duties.
Literature Review

The review is done to get insights into the methods, their shortcoming which we can
overcome. A literature review, a literature survey is a text of a scholarly paper, which
includes the current understanding along with great findings, as well as theoretical and
methodological contributions to a particular topic. The latent qualities of humans that can
provide inputs to any system in various ways have brought the attention of several learners,
scientists, engineers, etc. from all over the world. The current mental state of the person is
provided by facial expressions. Most of the time we use nonverbal clues like hand gestures,
facial expressions, and tone of voice to express feelings in interpersonal communication

Due to the significance of human behavioral intelligence in computing devices, this work
focus on the facial expressions and speech of humans for their emotion recognition in
multimodal (audio-video) signals. A comprehensive literature survey is conducted to study
and analyze the existing multimodal datasets and state-of-the-art methods for human emotion
recognition. The study explored various research issues and challenges in human emotion
detection through facial expressions and speech. Lastly identification of research gap.

Imtiyaz Ahmad et al. in 2020, in paper [6] explained that face and speech is the best tool to
identify emotions and better way is always tried by scientists to find the solution. Instead of
unimodal using face, speech must also be fusion. Proposed method only communicated ideas
of identifying emotions without dealing with depth details. There are several methods used
for the detection of faces, one of them named as Viola Jones algorithm which is widely used
and Relative Sub Image Based Features method is proposed while for speech RBFC method
was proposed. SVM used as classifier to identify emotions. LibSVM tool is prescribed for
implementation but no accuracy of the system was discussed.

Humaid Alshamsi et al. in 2018, in paper [7], identified emotions for real time in a mobile
application built for Android systems. SAVEE and RAVDESS datasets were used for the
same that results up to 97% appx. Cloud technology is used here and for feature extraction
MFCC techniques applied and SVM is used as a classifier and here technique used is multi
model for feature extraction from the audio input for emotions. For the implementation of
this work, MATLAB with Android features are combined for better accuracy.
Emotion recognition is a technology that enables computers to recognize and interpret human
emotions by analyzing facial expressions, voice, text or physiological signals. It finds
applications in human–computer interaction, mental health assessment, and personalized
content recommendation, offering insights into user sentiment and engagement. [1]
Introduced an innovative approach to emotion recognition which combines facial expressions
and speech cues within a multimodal system. This fusion of two distinct modalities is
achieved through two specific methods: feature-level fusion and decision-level fusion. To
evaluate the effectiveness of our approach, they conducted experiments using the
eNTERFACE'05 dataset. Comparative analysis reveals that the integration of fusion-based
techniques can substantially enhance the performance of emotion recognition systems.
Additionally, their findings highlight the superiority of feature-level fusion over decision-
level fusion in terms of overall performance. [2] In research work a Deep Learning algorithm
is utilized to create an integrated tool to identify the facial emotions and the stress level or
emotion quotient from speech.

Malyala Divya et al. (2019) [9], gave an idea for automated live facial emotion recognition
through image processing (IP) and artificial intelligence (AI) techniques. Here emotion is
detected by scanning static images and features for extraction like eyes, nose, and mouth
were used for face detection. Design of system is by initializing CNN model that taking an
input image by adding a convolution layer, pooling layer, flattened layers, and dense layers.
Convolution layers were added here for better accuracy for large datasets that were collected
from CSV file and then converted into images and lastly, emotions were classified with
respective expressions such as disgust, happy, surprise, neutral, sad, angry and fear with total
34,488 images selected for the purpose training dataset and 1,250 selected for testing
purposes. The percentage of every emotion classifier to everyone i.e., 66% of accuracy was
achieved.

Lakshmi Bhadana et al. (2020) [8], brought an approach that uses CNN as a classifier and
with the use of HAAR cascade recognition of Real-time Facial Expression Recognition for
classification of expressions of different person. This system shows accuracy of 58% with the
help of testing data used web-cam of the system and hence shows the emotion in a text
format that successfully classified seven different human emotions. There are many factors
that is to keep in mind while recognition of emotions like the level of the camera, lighting
conditions and deviate the accuracy that brings emotions out easier. Real time images result
better than still images.

Xiang Feng et al. (2020) [10], proposed a class of academic emotion analysis technology
based on artificial intelligence methods for online learning which helped researchers to
conduct research on learner well-being based on academic emotions. A framework related to
student comment aspect classification (student comments into teacher, course, and platform)
and academic emotion classification models is proposed based on which a machine learning
data set was produced and then an analysis framework of LSTM-ATT and A-CNN fusion
was developed. Aspect classification model and the academic emotion classification model
proposed proved superior to general machine learning models and conventional LSTM
network models with accuracy of 88.62% and 71.12% respectively.

Nahla Nour et al. (2020) [11], proposed the Facial Expression Recognition with the use of
CNN model with SVM classifier. Three models like Alexnet Model, VGG-16, ResNet
models used. CK+ datasets were used to check and predict results. In this study, CNN layers
and its uses are described including convolution layer, pooling layer, fully connected layer,
softmax classifier, sigmoid function etc. By training and testing with data recognition are
classified and performed. AlexNet model with higher accuracy judged in the paper.

Moe Moe Htay et al. (2021) [18], represents a survey of feature extraction of facial
expressions along with classification method. Steps considered here were face to face
detection of components, extraction of features of face image and classification of
expressions. Geometrical as well as appearance-based features discussed here for extraction
of features and both spontaneous and posed datasets were taken into consideration. Images
considered here of different form like peak express, portraying and video clips forms.

CK+ and JAFFE are the databases through which the system is checked. Such systems are
helpful in health care and patient monitoring.

Dr. N.Anantha Rufus et al. (2022) [19], discussed that there are many current applications
that needs human emotion recognition from their speech such as assessment of behavior, call
centers, emergency centers, virtual assistants etc. Use of MFCC algorithm or LSTM with
CNN gives best results in this concern. RAVDESS or TESS datasets are used for solving the
purpose. The concept is implemented using python OpenCV and results are achieved with
nearly 90% accuracy.

Summary of emotion recognition studies using speech signals

The summary of these articles used in the review analysis is shown in Table A.

Table A. Summary of emotion recognition studies using SPEECH signals included in the review.

Year Dataset Status Lengt Feature extraction Classificat Accurac Decision

name h ion y (%) type

2020 RAVDESS Public 10 ms MFCC and MS CNN 78.2 DL

2020 LDC Public MFCC and LPCC SVM 90.08 ML

2020 RAVDESS Public 10 ms DWT, MFCC, and Decision 85 ML

STSF Tree

2020 RAVDESS Public 10 ms STSF 1D CNN 71.61 DL

EMO-DB Public – 64.3

2019 AFEW Public 40 ms FFT and MSG CNN 60.59 DL

2022 EMO-DB Public 2s MFCC and BLS LDA 100 ML

2019 IEMOCAP Private – SG and MFCC CNN 73.6 DL

2019 RAVDESS Public 25 ms MFCC and SCF Bagged 75.69 ML

tree

2019 CASIA Private – – LSTM 92.8 DL

2019 CASIA Private – FFT DNN- 72.92 DL

SVM

2020 IEMOCAP Private 0.5 s clustering BiLSTM 72.25 DL

2021 RAVDESS Public – PSF and EE SVM 82.59 ML

2022 IEMOCAP Private 10 ms SIT CNN 82.25 DL

(ResNet15
2)

2021 TESS Public 2–3 s EMD with ENT LDA 93.3 ML

2021 IEMOCAP Private 20 ms MFCC, ZCR, spectral LSTM 72.5 DL

spread, and centroid

2019 EMO-DB Public 20 ms PSCF ELM 91.02 ML

Amritaemo Private 20 ms 86.98

Arabic
database

2022 Turkish Private 5s TQWT and SLP SVM 96.41 ML

SER dataset

2020 RAVDESS Public – MEL spectrogram MLP 83.8 DL

2021 RAVDESS Public – Spectrum and CNN 85 DL

spectrogram

2021 RAVDESS Public – TQWT with TSP SVM 87.43 ML

Highlights of speech-based emotion recognition as evident from the summary of Table A.,
one article each has been included from the years 2019, 2020, and 2022, respectively. The
audio/video or audio based have been used the most for emotion elicitation. The dataset
analysis reveals that EMO-DB, RAVDEES, CASIA, and IEMOCAP datasets have been the
most preferred choices for model testing. The highest strength of speech-based emotion
recognition is that multiple datasets have been used for method verification. Public speech
emotion datasets have been selected over private datasets. Power spectral density (PSD),
Mel-frequency cepstrum coefficients (MFCC), Mel spectrogram (MSG), STFT, and variants
of wavelet transform (WT) have been adopted the most for feature extraction. Model
validation using holdout CV was preferred the most for speech, followed by k-FCV, and the
least with LOSO CV, respectively. Review revels that DL models have an edge over ML
models for speech-based emotion recognition.

Summary of emotion recognition studies using facial images.

The review included 15 articles on the recognition of emotions using facial images. Table B
presents a summary of facial image-based emotion recognition.

Table B. Summary of emotion recognition studies using IMAGE signals included in the review.

Year Dataset Status No. of Feature extraction Classification Accuracy Decision

name images (%) type
2022 KDEF Public 4900 FLC SVM 85 ML
CK+ Public 3368 RF 97.86
CK+ Public 3368 GM-WLBP, GLCM CNN-LSTM 91.42 DL
2022 and GLRM
JAFFE Public 213 92.85
2022 CK+ Public 3368 GM-WLBP, GLCM CNN-LSTM 91.42 DL
and GLRM
JAFFE Public 213 92.85
2022 CMU Multi- Public 750K+ Face extraction using MTCNN 90 DL
PIE MTCNN
AffectNet Public 1M 90
Year Dataset Status No. of Feature extraction Classification Accuracy Decision
name images (%) type
2022 JAFFE Public 213 Normalization, CNN 95.65 DL
scaling, and
augmentation
CK+ Public 3150 99.36
2023 JAFFE Public 213 RetinaFace CNN 98.44 DL
FER2013 Public 35,887 74.64
AffectNet Public 1M 62.78
MMI Public 4756 99.02
2020 FER2013 Public 35,953 Reinforcement CNN 72.35 DL
learning
ExpW 91 793 50.61
2021 CK+ Public 918 – MobileNet 98.5 DL
CNN
2021 CK+ Public 593 SIFT, HOG, and Attention 98 DL
LBP CNN
FER2013 Public 35,887 70.02
JAFFE Public 213 92.8
FERG Public 55,767 99.3
2020 SAVEE Public 480 Facial graphs ANN 90 ML
2020 SFEW Public 700 GGPI GAN 27.24 DL
BU-3DFE Public 21 000 81.95
CMU Multi- Public 7655 92.09
PIE
2020 CK+ Public 593 Cropping and facial CNN with 98.9 DL
feature extraction attention
FER2013 Public 35,887 75.82
NCUFE Private 26,950 94.33
Oulu- Public 2880 94.63
CASIA
2020 CK+ Public 450 Expressional vector CNN 85 DL
CMU Multi- Public 750K+ 78
PIE
NIST Public 3248 96
2021 CK+ Public 593 MSWGT SVM 98.9 ML
JAFFE Public 213 97.1
FEEDTUM Public – 95.8
2023 FER2013 Public 35,887 Gray scale CNN 54 DL

Highlights of facial images-based emotion recognition:

The summary provided in Table B, reveals that the highest number of articles have been from
the years 2019 to 2023, respectively. The datasets CK+ and JAFFE have been the most
commonly used facial image datasets. In addition, FER2013, RAF-DB, and AffectNet have
also been used in many studies. The facial image-based emotion recognition studies have
validated their model on multiple datasets. The majority of the facial image datasets are
publicly available. Features based on geometric or texture of facial patterns is preferred. The
validation of the model using holdout CV followed by k-FCV strategies is most common.
The distribution of decision-making models for facial images is shown in table out of 15
articles, as many as 12 articles have preferred DL models for classification, 3 used ML
models. For ML models, the SVM classifier has been the most preferred, while CNN has an
upper edge over other DL models.

Most of the datasets used for emotion recognition are available publicly. However, the
majority of them have been utilized to their maximum capacity, resulting in the highest
classification accuracy. In addition, the available datasets have been acquired with a single
modality i.e., either for EEG, ECG, ET, GSR, speech, or facial images. Therefore, there still
exists a research gap in analyzing emotion recognition using multiple modalities from the
same subject. Also, the lack of availability of public emotion datasets for healthcare, brain-
computer interfaces, and other applications limits such analysis.

Research Issues and Challenges:

Research is well progressed to reduce the gap between machines and humans. Over the years,
the research in the field of emotion computing has achieved tremendous success. However,
computing devices with artificial emotional intelligence methods encounter certain
unresolved issues and challenges to effectively detect human feelings via facial expressions
and speech. The following discussions describe about the issues and challenges in
multimodal human emotion recognition.

Here are some of the main challenges:

Limited Dataset: The availability of labeled datasets for emotion detection is limited,
especially for less common emotions or for specific cultural contexts. This makes it
challenging to train deep learning models that can generalize well to new data.

Variability in Data: The data used for emotion detection can vary widely in terms of quality,
noise, and variability. For example, speech data can be affected by environmental noise,
accents, and speaking styles, while facial data can be affected by lighting conditions, facial
expressions, and occlusion.

Feature Extraction: Extracting relevant features from speech and facial data can be
challenging, especially when dealing with complex emotions that are not easily captured by
simple features. This requires careful design of feature extraction algorithms and feature
engineering techniques.

Interpretability: Deep learning models are often seen as “black boxes” that are difficult to
interpret. This can make it challenging to understand how the model is making decisions and
to diagnose errors or biases in the model.

Ethical and Privacy Concerns: Emotion detection using speech and facial data raises ethical
and privacy concerns, as it can be used for sensitive applications such as surveillance,
emotion profiling, and behavioral prediction.

Inadequate experimentation: on multimodal feature for emotion recognition in multilingual

tone-sensitive speech and facial expressions.

Lack of effort to analyse and handle person-dependent attributes in speech and facial
expressions to compute the most relevant and generalized information for emotions
detection: People have unique vocal tract and facial expression patterns when they talk, sing,
laugh, cry and do other activities. There facial profile characteristics have not matched with
each other in terms of their skin color, shape, and size of face, eyes, eyebrows, nose, cheeks,
mouth, chin, and hair color [15]. The voiced speech of them have also different
characteristics, the adult males speak in the range of 85 to 180 Hz, whereas the adult females
speak in the range of 165 to 255 Hz [16, 17]. The face and voice attributes of them have
varying over time as they get older. The within and between the diversity of speech and facial
expressions makes the development of a generalized system challenging.
Research Gap

A vast of work has been already started in the field of emotion detection using either facial
expressions or speech emotion recognition systems. But there are some systems that used
hybrid system for the better results. Hybrid here means results of emotions based on both face
emotions and speech. Most of the systems used Deep Convolutional Neural Networks for
emotion detection using various datasets and good accuracy is achieved from such systems.

Inadequate experimentation on multimodal feature for emotion recognition in multilingual

tone-sensitive speech and facial expressions

No work on Recommendation system for multimodal based emotion detection considering

both facial expressions and speech.

Problem Definition

I. The majority of the review articles previously published for emotion recognition
focused on a single modality i.e., either physiological signal, speech, or facial images.

II. Limited papers : only music recommendation system on emotion recognition.

III. No work on “Recommendation system for multimodal based emotion detection using
voice & facial expression”

IV. Inadequate experimentation on multimodal feature for emotion recognition in

multilingual tone-sensitive speech and facial expressions
V. Thus, our research work has focused on the joint processing of audio and video
modalities for emotion recognition through facial expressions and speech.

VI. To propose a Multimodal Emotion Recognition (MER) which can adaptively integrate
the most discriminating features from facial expressions & speech to improve the
performance.

Aim and Objectives

Aim is to develop system that detect emotions & based on emotions the system generate
customize response to enhance mental health for users.

In order to achieve above aim following objectives are:

1. To study and analyze the existing audio-video emotion detection methods & datasets.
2. To develop the multimodal dataset.
3. To extract and propose the feature vector for emotion recognition in tone-sensitive
speech and facial expressions
4. To propose a multimodal system through the fusion of peak stage behavior of facial
expressions and speech for emotion recognition.
5. To design innovative classification model based on deep learning to derive more
appropriate classification of different emotions (Happy, sad, neutral, angry, excited &
frustrated) which may be useful in many real world applications.
6. To validate and compare the performance of proposed system with existing work.
7. To recommend: whether listen music (devotional, instrumental etc.), watch movie,
acupressure therapy, yoga , quotes, etc… based on recognized emotion.
Proposed Research Methodology

There exist three main types of state-of-the-art methodologies for emotion recognition in
affective computing: (a) image/video unimodal approach (b) audio/linguistic unimodal
approach and (c) audio+video multimodal approach. Researchers have made many efforts to
improve the accuracy and reduce the computational cost of emotion recognition methods.
Image/video unimodal approach has recognized the human emotions through only facial
expressions. They have not considered the sound information of humans. On the other hand,
the audio/linguistic unimodal approach has detected the human emotion through speech only.
They have not considered the vision information of facial expressions of humans. However,
people convey their feelings via facial expressions and speech. They simultaneously or may
use one of these constructs to communicate their feelings. Thus, our research work has
focused on the joint processing of audio and video modalities for emotion recognition
through facial expressions and speech. Multimodal emotion recognition approach consists of
five steps: (1) multimodal dataset, (2) preprocessing, (3) feature extraction, (4) fusion and
classification, and (5) recommendation based on predicted emotion. General pipeline for
multimodal emotion recognition via facial expressions and speech is presented in Figure 4.
Recommendation to listen music (devotional, instrumental etc.), watch
movie, acupressure therapy, yoga , quotes, etc…. based on emotion
such as Happy, sad, neutral, angry, excited & frustrated.
Recommendation based on predicted emotion

Figure 4: Recommendation system for multimodal emotion recognition via facial

expressions and speech
Plan of Research

Semeste
Planned research activity
r
I Literature Survey
II Collect multimodal dataset & Proposing Algorithm
III Image Preprocessing & feature extraction
IV Feature fusion and Classification and recommendation
To validate and compare the performance of the proposed system with existing
V work.

VI Thesis Writing
References
[1] Pragya Singh Tomar, Kirti Mathur & Ugrasen Suman , “Fusing facial and speech cues for
enhanced multimodal emotion recognition” ,international journal of information technology,
vol.16, pp.1397-1405, Jan 2024.
[2] S Shajith Ahamed; J. Jabez; M Prithiviraj, “Emotion Detection using Speech and Face in
Deep Learning”, International Conference on Sustainable Computing and Smart Systems
(ICSCSS), Coimbatore, India, IEEE Xplorer, pp.317-321, 14-16 June 2023.
[3] Naveed Ahmed, Zaher Al Aghbari, Shini Girija, “A systematic survey on multimodal
emotion recognition using learning algorithms”, Intelligent Systems with Applications
Volume 17,Elsevier, February 2023.
[4] Sun J., Han J., Wang Y., Liu P., “Memristor-based neural network circuit of emotion
congruent memory with mental fatigue and emotion inhibition”,IEEE Trans. Biomed.
Circuits Syst., 15 (3) (2021), pp. 606-616
[5] Shu L., Xie J., Yang M., Li Z., Li Z., Liao D., Xu X., Yang X. , “A review of emotion
recognition using physiological signals Sensors”, 18 (7) (2018), p. 2074
[6] Imtiyaz Ahmad, Ramendra Pathak, Dr. Yaduvir Singh, Mr. Jameel Ahamad, “Emotion
Detection using Facial Expression and Speech Recognition”, International Journal of Future
Generation Communication and Networking Vol. 13, No. 3, (2020), pp. 123 – 133.
[7] Humaid Alshamsi, Veton Kepuska, Hazza Alshamsi, Hongying Meng, “Automated Facial
Expression and Speech Emotion Recognition App Development on Smart Phones using
Cloud Computing”, 9th IEEE Annual Information Technology, Electronics and Mobile
Communication Conference (IEMCON), November 2018, DOI:
10.1109/IEMCON.2018.8614831.
[8] Lakshmi Bhadana, P.V.Lakshmi ,D.Rama Krishna,G.Surya Bharti ,Y.Vaibhav, Real Time
Facial Emotion Recognition With Deep Convolutional Neural Network, Journal Of Critical
Reviews ISSN- 2394- 5125 Vol 7, Issue 19, 2020.
[9] MalyalaDivya, R Obula Konda Reddy, C Raghavendra, “Effective Facial Emotion
Recognition using Convolutional Neural Network Algorithm”, International Journal of
Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-8 Issue-4,
November 2019.
[10] Xiang Feng, YaojiaWei, Xianglin Pan, Longhui Qiu and Yongmei Ma, “Academic
Emotion Classification and Recognition Method for Largescale Online Learning
Environment—Based on A-CNN and LSTM-ATT Deep Learning Pipeline Method”,
International Journal of Environmental Research and Public Health.
[11] Nahla, N., E. Mohammed, and V. Serestina. "Face expression recognition using
convolution neural network (CNN) models." International Journal of Grid Computing &
Applications 11, no. 4 (2020): 1-11.
[12] Dalgleish T. “The emotional brain” , Nat. Rev. Neurosci., 5 (7) (2004), pp. 583-589
[13] Rached T.S., Perkusich A. , “Emotion recognition based on brain-computer interface
systems”,Fazel-Rezai R. (Ed.), Brain-ComputerInterfaceSystems, IntechOpen, Rijeka (2013),
[14] Ekman P.,“An argument for basic emotions “ , Cogn. Emot., 6 (3–4) (1992), pp.169-200
[15] Plutchik R., Kellerman H. Theories of Emotion, Vol. 1 Academic Press (2013)
[16] Wilson G.F., Russell C.A. , “Real-time assessment of mental workload using
psychophysiological measures and artificial neural networks”, Hum. Factors, 45 (4) (2003),
pp. 635-644
[17] Mehrabian A. , “Pleasure-arousal-dominance: A general framework for describing and
measuring individual differences in temperament”, Curr. Psychol., 14 (1996), pp. 261-292
[18] Htay, Moe Moe. "Feature extraction and classification methods of facial expression: A
survey." Computer Science and Information Technologies 2, no. 1 (2021): 26-32.
[19] Dr. N. Anantha Rufus, M.Zaheer, S. A. V. Dolendrakumar, P. Penchalalokesh, SPEECH
EMOTION RECOGNITION USING DEEP LEARNING, International Research Journal of
Modernization in Engineering Technology and Science (Peer-Reviewed, Open Access, Fully
Refereed International Journal) Volume:04/Issue:05/May-2022 Impact Factor- 6.752
www.irjmets.com.

Name and Signature of Scholar Name and Signature of Supervisor

Mrs. Kiran R. Gavhale Dr. Ujwalla Gawande

Dr. U. P. Waghe
Principal and Head of the Research Center

Emotion Detection Techniques Review
No ratings yet
Emotion Detection Techniques Review
24 pages
Sensors 23 02455
No ratings yet
Sensors 23 02455
33 pages
2024ELSEVIER
No ratings yet
2024ELSEVIER
36 pages
De 2015
No ratings yet
De 2015
5 pages
Abstract:: Keywords: Emotion Detection, Natural Language Processing, Adversarial Transfer Learning
No ratings yet
Abstract:: Keywords: Emotion Detection, Natural Language Processing, Adversarial Transfer Learning
17 pages
Emotion Recognition From Physiological Signal Analysis - A Review
No ratings yet
Emotion Recognition From Physiological Signal Analysis - A Review
21 pages
Chapter One Dorcas - 011309
No ratings yet
Chapter One Dorcas - 011309
8 pages
Computer Methods and Programs in Biomedicine
No ratings yet
Computer Methods and Programs in Biomedicine
30 pages
Affective Computing
No ratings yet
Affective Computing
6 pages
A Machine Learning Emotion Detection Platform To Support Affective Well Being
No ratings yet
A Machine Learning Emotion Detection Platform To Support Affective Well Being
7 pages
Machine Learning Techniques For Real-Time Emotion Detection From Facial Expressions
No ratings yet
Machine Learning Techniques For Real-Time Emotion Detection From Facial Expressions
6 pages
Minor PRJ LR Paper
No ratings yet
Minor PRJ LR Paper
35 pages
Emotion Detection in Text: A Review: UNC Charlotte UNC Charlotte UNC Charlotte
No ratings yet
Emotion Detection in Text: A Review: UNC Charlotte UNC Charlotte UNC Charlotte
14 pages
1 s2.0 S0010482523009150 Main
No ratings yet
1 s2.0 S0010482523009150 Main
31 pages
Term Paper
No ratings yet
Term Paper
13 pages
Artificial Intelligence in Emotion Quantification - A Prospective Overview
No ratings yet
Artificial Intelligence in Emotion Quantification - A Prospective Overview
11 pages
Engproc 59 00037
No ratings yet
Engproc 59 00037
9 pages
Emotion Detection From Text: A Survey
No ratings yet
Emotion Detection From Text: A Survey
8 pages
Text-Based Emotion Recognition
100% (1)
Text-Based Emotion Recognition
8 pages
AMIGOS Dataset for Emotion Detection
No ratings yet
AMIGOS Dataset for Emotion Detection
9 pages
Survey of Emotion Recognition Methods Using EEG Information
No ratings yet
Survey of Emotion Recognition Methods Using EEG Information
15 pages
Kamaldeep Kaur
No ratings yet
Kamaldeep Kaur
45 pages
DSP RP 1
No ratings yet
DSP RP 1
21 pages
Facial Emotion Recognition
No ratings yet
Facial Emotion Recognition
6 pages
Minor PRJ LR Paper
No ratings yet
Minor PRJ LR Paper
7 pages
Facial Emotion Recognition For Therapy Session
No ratings yet
Facial Emotion Recognition For Therapy Session
8 pages
Emotion Detection Using Text
No ratings yet
Emotion Detection Using Text
5 pages
Deep Learning Techniques For Comprehensive Emotion Recognition and Behavioral Regulation
No ratings yet
Deep Learning Techniques For Comprehensive Emotion Recognition and Behavioral Regulation
6 pages
Sicheng Zhao Multimodal
No ratings yet
Sicheng Zhao Multimodal
15 pages
Emotion Detection
No ratings yet
Emotion Detection
53 pages
4.sandeep - A Critical Analysis of Emotion Detection From Speech Signals - 19-25
No ratings yet
4.sandeep - A Critical Analysis of Emotion Detection From Speech Signals - 19-25
7 pages
Emotion Detection Through Facial Feature Recognition: James Pao
No ratings yet
Emotion Detection Through Facial Feature Recognition: James Pao
6 pages
Emotion Recognition via Smart Sensors
No ratings yet
Emotion Recognition via Smart Sensors
7 pages
2024 Springer
No ratings yet
2024 Springer
47 pages
Cap 13 02 02 en
No ratings yet
Cap 13 02 02 en
15 pages
Unsupervised Learning in Reservoir Computing For EEG-Based Emotion Recognition
No ratings yet
Unsupervised Learning in Reservoir Computing For EEG-Based Emotion Recognition
13 pages
Gratch - The Field of Affective Computing
No ratings yet
Gratch - The Field of Affective Computing
13 pages
Abstract For Facial Emotion Detection Using Neural Networks
No ratings yet
Abstract For Facial Emotion Detection Using Neural Networks
48 pages
Applsci 11 04131
No ratings yet
Applsci 11 04131
22 pages
Recognition of Emotional State Based On Handwritin
No ratings yet
Recognition of Emotional State Based On Handwritin
8 pages
A Review of Emotion Recognition Based On
No ratings yet
A Review of Emotion Recognition Based On
9 pages
A Theory of Emotion Based On A Universal Model: Article
No ratings yet
A Theory of Emotion Based On A Universal Model: Article
8 pages
3DText Perceiving Sentence Level Text On 3-D Model of Emotions
No ratings yet
3DText Perceiving Sentence Level Text On 3-D Model of Emotions
9 pages
On Improving Visual-Facial Emotion Recognition With Audio-Lingual and Keyboard Stroke Pattern Information
No ratings yet
On Improving Visual-Facial Emotion Recognition With Audio-Lingual and Keyboard Stroke Pattern Information
7 pages
Recent Advances in Emotion Recognition
No ratings yet
Recent Advances in Emotion Recognition
8 pages
Real-Time Emotion Recognition System
No ratings yet
Real-Time Emotion Recognition System
7 pages
Hourglass of Emotions
No ratings yet
Hourglass of Emotions
14 pages
A VGG16 Based Hybrid Deep Convolutional Neural Network Based Real Time Video Frame Emotion Detection
No ratings yet
A VGG16 Based Hybrid Deep Convolutional Neural Network Based Real Time Video Frame Emotion Detection
11 pages
Human Emotion Detectionusing Machine Learning Techniques
No ratings yet
Human Emotion Detectionusing Machine Learning Techniques
8 pages
10 1088@1361-6579@ab1887
No ratings yet
10 1088@1361-6579@ab1887
19 pages
A Human Facial Expression Recognition Model Based
No ratings yet
A Human Facial Expression Recognition Model Based
8 pages
A Brief Survey of Machine Learning Methods For Emotion Prediction Using Physiological Data
No ratings yet
A Brief Survey of Machine Learning Methods For Emotion Prediction Using Physiological Data
7 pages
Multimodal Recognition With Deep Learning: Audio, Image, and Text
No ratings yet
Multimodal Recognition With Deep Learning: Audio, Image, and Text
11 pages
Lightweight Deep Learning for FER
No ratings yet
Lightweight Deep Learning for FER
18 pages
Compare Data
No ratings yet
Compare Data
19 pages
SRS Facial Expression Recognition
60% (5)
SRS Facial Expression Recognition
15 pages
10 Annexures
No ratings yet
10 Annexures
25 pages
Major Project Synopsis
No ratings yet
Major Project Synopsis
12 pages
2015 - Paper - Emotion Recognition - A Selected Review
No ratings yet
2015 - Paper - Emotion Recognition - A Selected Review
15 pages
Question Bank With Solution Set
No ratings yet
Question Bank With Solution Set
69 pages
Practicalno 3
No ratings yet
Practicalno 3
1 page
Ai QB 1,2,3
No ratings yet
Ai QB 1,2,3
2 pages
UI/UX Design: Project-Based Learning
No ratings yet
UI/UX Design: Project-Based Learning
2 pages
Unit 6
No ratings yet
Unit 6
1 page
CSD2302 15 07 2024 15 07 2024
No ratings yet
CSD2302 15 07 2024 15 07 2024
1 page
Public Sector Control Strategies
No ratings yet
Public Sector Control Strategies
19 pages
Call Centres and AI: 6-Minute English
No ratings yet
Call Centres and AI: 6-Minute English
5 pages
Factors That Affects Open Communication Between Parents and Highschool Students of Imus Institute of Science and Technology S.Y. 2017 2018 9 CALCIUM G3 1
No ratings yet
Factors That Affects Open Communication Between Parents and Highschool Students of Imus Institute of Science and Technology S.Y. 2017 2018 9 CALCIUM G3 1
76 pages
70 Comparative Sentence Exercises
No ratings yet
70 Comparative Sentence Exercises
10 pages
Organizational Chart - Timeline
No ratings yet
Organizational Chart - Timeline
5 pages
IDT Notes
No ratings yet
IDT Notes
11 pages
Specification of Summative Assessment For Term On The Subject "English" Grade 3
No ratings yet
Specification of Summative Assessment For Term On The Subject "English" Grade 3
21 pages
Human Development
No ratings yet
Human Development
34 pages
DepEd Evaluation Indicators for Media
No ratings yet
DepEd Evaluation Indicators for Media
4 pages
Slade Et Al 2014
No ratings yet
Slade Et Al 2014
9 pages
Comprehension Assessments 06 2023
No ratings yet
Comprehension Assessments 06 2023
3 pages
Problem Scoping Notes
No ratings yet
Problem Scoping Notes
3 pages
Understanding the 7 Deadly Sins
No ratings yet
Understanding the 7 Deadly Sins
3 pages
Kurtz 2001
No ratings yet
Kurtz 2001
31 pages
RPG Action & Downtime Flowcharts
No ratings yet
RPG Action & Downtime Flowcharts
1 page
Leadership Principles at Amazon
100% (3)
Leadership Principles at Amazon
2 pages
The Matrix of Visual Culture Working With Deleuze in Film Theory 1st Edition Patricia Pisters Download
No ratings yet
The Matrix of Visual Culture Working With Deleuze in Film Theory 1st Edition Patricia Pisters Download
41 pages
Leveraging Employee Engagement For Competitive Advantage Hrs Strategic Role
No ratings yet
Leveraging Employee Engagement For Competitive Advantage Hrs Strategic Role
6 pages
Infographic Rubric
No ratings yet
Infographic Rubric
1 page
Masculine Beating Fantasy
No ratings yet
Masculine Beating Fantasy
24 pages
Kotter - S 8 Step Change Model (With Examples For Each)
No ratings yet
Kotter - S 8 Step Change Model (With Examples For Each)
10 pages
Geology Career Aspirant Profile
No ratings yet
Geology Career Aspirant Profile
1 page
Salt Analysis 4
0% (1)
Salt Analysis 4
3 pages
Small Groups Key Readings, 1st Edition PDF
100% (9)
Small Groups Key Readings, 1st Edition PDF
16 pages
10 Đề Thi Reading Thật - IELTS Fighter
No ratings yet
10 Đề Thi Reading Thật - IELTS Fighter
55 pages
How Stuttering Develops The Multifactorial Dynamic Pathways Theory Journal Review
No ratings yet
How Stuttering Develops The Multifactorial Dynamic Pathways Theory Journal Review
6 pages
ABA 510 Study Questions Part 2
No ratings yet
ABA 510 Study Questions Part 2
3 pages
Exploring Cultural Texts in Communication
No ratings yet
Exploring Cultural Texts in Communication
23 pages
Nepsy - Ii-Battery-Neuropsychological-Children PDF
No ratings yet
Nepsy - Ii-Battery-Neuropsychological-Children PDF
3 pages
Safety Tips for Solo Travelers
No ratings yet
Safety Tips for Solo Travelers
1 page

Synopsis KRG

Uploaded by

Synopsis KRG

Uploaded by

Synopsis

Keywords: Emotion Recognition, Social emotional, Facial Expression Recognition, Speech

Emotion is a dynamic cognitive and physiological condition that develops in reaction to

1.1. Paradigms of emotion

Fig. 1. Plutchik’s wheel of emotions [15].

The multidimensional approach for emotions acknowledges that emotions are

Fig. 2. 2D VA emotion model.

1.2. Emotion sensing modalities

1.3 Application of human emotion recognition

Emotion recognition covers many applications, including brain-computer interfaces, robotics,

Detection and monitoring of medical conditions:

Summary of emotion recognition studies using speech signals

Year Dataset Status Lengt Feature extraction Classificat Accurac Decision

2020 RAVDESS Public 10 ms MFCC and MS CNN 78.2 DL

2020 LDC Public MFCC and LPCC SVM 90.08 ML

2020 RAVDESS Public 10 ms DWT, MFCC, and Decision 85 ML

2020 RAVDESS Public 10 ms STSF 1D CNN 71.61 DL

EMO-DB Public – 64.3

2019 AFEW Public 40 ms FFT and MSG CNN 60.59 DL

2022 EMO-DB Public 2s MFCC and BLS LDA 100 ML

2019 IEMOCAP Private – SG and MFCC CNN 73.6 DL

2019 RAVDESS Public 25 ms MFCC and SCF Bagged 75.69 ML

2019 CASIA Private – – LSTM 92.8 DL

2019 CASIA Private – FFT DNN- 72.92 DL

2020 IEMOCAP Private 0.5 s clustering BiLSTM 72.25 DL

2021 RAVDESS Public – PSF and EE SVM 82.59 ML

2022 IEMOCAP Private 10 ms SIT CNN 82.25 DL

2021 TESS Public 2–3 s EMD with ENT LDA 93.3 ML

2021 IEMOCAP Private 20 ms MFCC, ZCR, spectral LSTM 72.5 DL

2019 EMO-DB Public 20 ms PSCF ELM 91.02 ML

Amritaemo Private 20 ms 86.98

2022 Turkish Private 5s TQWT and SLP SVM 96.41 ML

2020 RAVDESS Public – MEL spectrogram MLP 83.8 DL

2021 RAVDESS Public – Spectrum and CNN 85 DL

2021 RAVDESS Public – TQWT with TSP SVM 87.43 ML

Summary of emotion recognition studies using facial images.

Year Dataset Status No. of Feature extraction Classification Accuracy Decision

Highlights of facial images-based emotion recognition:

Research Issues and Challenges:

Here are some of the main challenges:

Inadequate experimentation: on multimodal feature for emotion recognition in multilingual

Inadequate experimentation on multimodal feature for emotion recognition in multilingual

No work on Recommendation system for multimodal based emotion detection considering

II. Limited papers : only music recommendation system on emotion recognition.

IV. Inadequate experimentation on multimodal feature for emotion recognition in

Aim and Objectives

In order to achieve above aim following objectives are:

Figure 4: Recommendation system for multimodal emotion recognition via facial

Name and Signature of Scholar Name and Signature of Supervisor

You might also like