Guide Name: Mr.
Devendra Kumar Mishra
ANALYSIS OF HINGLISH CONTENT
Department of Computer Science and Engineering (CSE)
TEAM MEMBERS
Ashish Kumar Singh(4th year)
Ayush Prasad (4th year)
Harsh Srivastava (4rd year)
Department of Computer Science and Engineering (CSE), ABES Engineering College Ghaziabad
INTRODUCTION: WHAT IS HINGLISH?
“Hinglish” is a common tongue found in casual
conversations where a combination of Hindi and English
phrases are used together in the same context.
An example would be: jaldi karo guys, or we’ll be late for
the movie. It means: let’s hurry up guys, or we’ll be late
for the movie.
In Natural Language Processing (NLP) parlance, it’s
called code-mixing.
Department of Computer Science and Engineering (CSE), ABES Engineering College Ghaziabad
OBJECTIVE
Sentiment analysis of Hinglish tweets, which are written
entirely in Latin script but contain slang words from English
and Hindi, commonly used in India.
We might lose out on the important sentiments that might be
conveyed by the part written in Hindi.
Thus it is important to take into account the sentiment of both
the languages.
Department of Computer Science and Engineering (CSE), ABES Engineering College Ghaziabad
PROBLEMS FACED IN PREVIOUS
WORKS
Handling code-mixed languages is usually harder than handling a
pure language.
The problem with these texts is that the Hindi written is in an
informal manner, also it is not in the script in which the language is
originally written.
Hence different people might have different versions of spellings
and the rule with which they write such texts.
Department of Computer Science and Engineering (CSE), ABES Engineering College Ghaziabad
WHAT IS NATURAL LANGUAGE
PROCESSING?
NLP is a part of Computer Science and Artificial Intelligence which
deals with human languages.
Natural language processing strives to build machines that
understand and respond to text or voice data, and respond with text
or speech of their own, in much the same way humans do.
Department of Computer Science and Engineering (CSE), ABES Engineering College Ghaziabad
TECHNOLOGY REQUIREMENTS
PYTHON
JUPYTER NOTEBOOK/ GOOGLE COLAB
NATURAL LANGUAGE PROCESSING
kTrain
BERT
Department of Computer Science and Engineering (CSE), ABES Engineering College Ghaziabad
METHODOLOGY
Pre-processing: One of the most initial steps where tasks like
removing hashtags, mentions and links in the tweet were completed.
Spelling Normalization: To catch the spelling nuances in Hinglish.
A single vowel difference in a Hindi word might mean totally
different things. Found out the stem words using the stemmer
package.
Department of Computer Science and Engineering (CSE), ABES Engineering College Ghaziabad
METHODOLOGY (CONT..)
Clustering: In this task we clustered out the Hindi and the
English portions of the tweet. One of the main properties
of such texts is that the English and the Hindi parts
generally exist in groups. Hence we first try to isolate
them. We use the corpus generated from a dictionary.
Department of Computer Science and Engineering (CSE), ABES Engineering College Ghaziabad
METHODOLOGY (CONT..)
Training and Prediction
• Using Pre-trained XLM- RoBERTa model.
What is Bidirectional Representation for Transformers
(BERT)?
• BERT is a multilingual model trained on 100 different
languages.
• It is a very powerful tool that broke several records for how
well model can handle language-based tasks.
Department of Computer Science and Engineering (CSE), ABES Engineering College Ghaziabad
PROJECT CODE SNIPPETS
Department of Computer Science and Engineering (CSE), ABES Engineering College Ghaziabad
REFERENCES
Mondher Bouazizi, Tomoaki Ohtsuki, “Multi-Class Sentiment Analysis in Twitter: What if
Classification is not the Answer,” IEEE Access, ISSN:2169-3536, vol. 6, pp. 64486 -
64502,18 October 2018.
Sasidhar, T.T., Premjith, B. and Soman, K.P. Emotion detection in Hinglish (Hindi+
English) codemixed social media text. Procedia Computer Science, 171, pp.1346-1352,
2020.
Gupta, V.K. " ‘Hinglish’ Language--Modelling a Messy Code-Mixed Language”. arXiv
preprint arXiv:1912.13109, 2019.
Deng, L. and Liu, Y. eds. Deep learning in natural language processing. Springer, 2018
Shulong Tan, Yang Li, Huan Sun, Ziyu Guan, Xifeng Yan, Jiajun Bu, Chun Chen, and
Xiaofei He, “Interpreting the Public Sentiment Variations on Twitter,” IEEE transactions
on knowledge and data engineering, vol. 26, no. 5, may 2014.
Department of Computer Science and Engineering (CSE), ABES Engineering College Ghaziabad
THANK YOU
Department of Computer Science and Engineering (CSE), ABES Engineering College Ghaziabad