0% found this document useful (0 votes)
12 views5 pages

A Detailed Survey and Comparative Study of Sentiment Analysis Algorithms

This paper provides a comprehensive survey of sentiment analysis algorithms, categorizing them into Document Level, Sentence Level, and Feature Level analyses. It discusses the importance of sentiment analysis in decision-making for consumers and presents a comparative study of various algorithms based on their accuracy. The paper also outlines the sentiment analysis framework, including data collection, preprocessing, and classification techniques.

Uploaded by

Srëdha Sp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views5 pages

A Detailed Survey and Comparative Study of Sentiment Analysis Algorithms

This paper provides a comprehensive survey of sentiment analysis algorithms, categorizing them into Document Level, Sentence Level, and Feature Level analyses. It discusses the importance of sentiment analysis in decision-making for consumers and presents a comparative study of various algorithms based on their accuracy. The paper also outlines the sentiment analysis framework, including data collection, preprocessing, and classification techniques.

Uploaded by

Srëdha Sp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

IEEE 2nd International Conference on Communication, Control and Intelligent Systems (CCIS)

A Detailed Survey and Comparative Study of


Sentiment Analysis Algorithms
Harsha Sinha Arashdeep Kaur
Department of Computer Science & Engineering, Department of Computer Science & Engineering,
Amity School of Engineering & Technology Amity School of Engineering & Technology,
Amity University Uttar Pradesh, Noida, India Amity University Uttar Pradesh, Noida, India
E-maiIId:[email protected] E-mail Id: [email protected] 2

Abstract-Sentiment Analysis is the process of figuring out Sentiment Analysis is done at three different levels
the emotions from a piece of writing that whether it is positive, namely that are Document Level, Sentence Level and Feature
negative or neutral and is used to tell the speaker's attitude. The Level. Document Level sentiment analysis takes the whole
trend, today, is to consider the opinions of a variety of individuals document and classifies it into two categories positive and
around the globe before purchasing an item using micro-blogging
negative based on the sentiment expressed by the user.
data. Customers tend to go over a lot of reviews about a
particular item before buying it. Sentiment Analysis makes this Document Level analysis reduces the whole document into
task easy for the customers. Sentiment Analysis aims to achieve single level score. Analysis is done based on four emotions
its function in the simplest manner with the help of an existing that are "Joy: sadness", "Acceptance: Disgust", "Anticipation:
approach and an existing algorithm. This paper focuses on the surprise" and "fear: anger". The problem with this analysis is
various Sentiment Analysis algorithms available in the existing that it hides the best insights, the useful ones, and prevents
literature. Further, this paper presents the comparative study of clients from drilling down to extract the useful information.
different Sentiment Analysis algorithms on the basis of accuracy.
Keyword: Sentiment Analysis; Framework; Accuracy; Sentence level Sentiment Analysis takes a sentence and
Classification determines whether that sentence is positive, negative, or
neutral opinion. Neutral usually means no opinion. It is further
1. INTRODUCTION classified into subjectivity classification and sentiment
Sentiment Analysis is the process of identifying and classification. There are two kinds of information in a
categorizing opinions expressed in a piece of text, especially in particular sentence; objective and subjective. Subjectivity
order to determine whether the writer's attitude towards a classification means determining the type of sentence.
particular topic or product is positive, negative, or neutral. Sentiment classification furthers classifies the subjective
With the increasing use of micro blogging websites such as information into positive and negative. Sentence Level
twitter, facebook and other social media, every day a lot of analysis is somehow related to subjectivity classification
which separates sentences that express factual information
reviews are being made available online. These reviews could
from sentences that express subjective views and opinions.
be of a product, movie or it can be an independent statement
describing a situation. Sentiment analysis is thus used to Feature Level Sentiment analysis takes into account the
classify these statements as a positive one or a negative one. opinion itself It is based on the idea that an opinion consists of
There are various benefits of Sentiment Analysis. It makes the an emotion which could be either a positive one or a negative one
user aware about the various positive and negative features of and a target (of opinion) consists of three main tasks. Extraction
any product. It helps the users in effective decision making. of features the web content is the first step. The next step is
Furthermore, SA helps companies to seek feedback from these determining the opinion's polarity. The last and the final task are
reviews and alleviate their products/services wherever to group the feature synonym. This type of classification is also
necessary. For example, when a person plans to buy a mobile known as word/phrase classification. Feature level looks at the
phone, he tends to scrutinize multiple review sites to read the opinion itself and does not take into account language constructs
reviews that the other consumers have written. In this manner, (documents, paragraphs, sentences, clauses or phrases). It is
the consumer can get an idea about the features that he may based on the idea that an opinion consists of a sentiment (positive
consider as important. Analyzing the reviews available on or negative) and a target.
thousands of sites is a tedious task. Sentiment Analysis thus Document Level and Sentence Level analysis does not
comes into play at such situations. It eases the consumer's task recognize each and every detail of the opinions and facts and
of categorizing the text into positive and negative which thus feature level analysis is done widely. A specific model
further helps them in effective decision making. Moreover, framework is followed throughout the process of Sentiment
there are noisy reviews, some misleading ones, which could Analysis. Many of the researchers have given their
easily hamper the consumer's decision. contributions in this field. Most of the works reported by

978-1-5090-3210-5/16/$31.00 © 20161£££
different researchers is discussed in the existing literature that 1) Removing URLs
has been presented in this paper.
URLs are of no use while performing sentiment analysis
The main contribution of this paper is that it gives the and can sometimes lead to false analysis. For example "I have
overview of various existing algorithms for sentiment analysis. logged in to www.happy.comasI·mbored... This sentence is
This paper also presents a comparative study of different negative but because of there is one positive word in the uri, it
Sentiment Analysis algorithms used in various fields on the becomes neutral thus leading to a wrong prediction. To avoid
basis of accuracy. The paper also explains why there is a the chances of false prediction, URLs must be removed.
necessity of Sentiment Analysis and explains the framework of
the Sentiment Analysis process in detail. 2) Filtering
The rest of this paper is organized as follows. Section 2 Repeated letters in words like "thankuuuuu" are often
focuses on the Sentiment Analysis framework. Section 3 used to show the depth of expression. However, these words
presents the literature review. Section 4 showcases the are absent in the dictionary hence the extra letters in the word
comparative study and conclusion is in Section 5. needs to be eliminated. This is done on the basis of a rule that
a letter cannot repeat itself more than three times and if there is
II. SENTIMENT ANALYSIS FRAMEWORK
such letter that will be eliminated.
This section discusses the basic Sentiment analysis
framework which can be used to judge the emotions from 3) Questions
website. This framework consists of three main steps. The first Words like "what", "which", "how" etc., does not
step being data collection, followed by preprocessing of the contribute to polarity and thus such words must be removed in
data collected. The last step is the classification which order to reduce the complexity.
categorizes the data processed into either positive or negative.
Fig. 1 gives the basic overview of sentiment analysis 4) Removing special characters
framework.
In order to remove discrepancies during the Sentiment
Analysis process, special characters like '[] {} 0/' should be
removed. For example "it's good:" If these characters are not
eliminated before performing sentiment analysis, they will get
combined with the words and those words will not be
recognized. To avoid the situation, removal of such characters
is important.
5) Removing Stop words and emoticons
Stop words are words that should be excluded in order to
proceed with the SA process. Stop words don't carry as much
meaning, such as determiners and prepositions (in, to, from,
Fig. I: Sentiment Analysis Framework etc.) and thus needs to be filtered. Most of the times, while
writing a review, people tend to use emoticons in order to
A. Data Collection express their feelings better. Although, these emoticons help in
Sentiment Analysis can be done on any data. The data can better understanding of the emotions but while performing
either be collected from any data set or can be extracted from Sentiment analysis, this can mislead and predict wrong.
any website. Data set is available online with thousands of 6) Lemmatization or stemming
reviews along with the label of positive and negative. On the
other hand, extracting data from web is a lengthy task but one Lemmatization and stemming aims to reduce inflectional
can perform sentiment analysis on the data of their own and related forms of a word to a common base forms.
choice[ 19]. Stemming achieves its goal correctly most of the time by
removing the ends of the words. Whereas, lemmatization does
B. Pre-Processing the same process properly with the use of a vocabulary and
Data extracted from the web contains several syntactic morphological analysis of words.
features that may not be useful and therefore data cleaning and 7) Tokenization
filtering needs to be done. In order to remove the unprocessed
data, this step needs to be performed. It is imperative to pre- Tokenization refers to splitting the sentence into its
process all the data to carry out further functionalities [20]. desired constituent parts. It is an important step in all NLP
The various pre-processing steps involved are given as below: tasks.

95
8) Feature selection performs by the Principle of Maximum Entropy [19]. From all
the models that fit the training data it tends to select the one
It finds a reduced set of attributes that provides a suitable
which has the largest entropy. Apart from performing
representation of the database given a certain analysis to be
Sentiment analysis the Max Entropy classifier aims to solve a
performed. This is necessary because the excessive use of
lot of text classification problems such as detecting languages,
slangs, ironies and language mixtures makes the classification
classification of topics and more.
task easy.
3) Support machine vector classifier
C. Classification
The classifier is a supervised learning models with
Classification is a technique which classifies data into
associated learning algorithms that analyze knowledge used
various categories. Classification is also used in the field of
for classification and multivariate analysis. A SVM model
Sentiment Analysis in order to classify data into three classes
represents examples as points. These examples are mapped so
namely positive, negative and neutral and based on that the
that the new examples are divided by clear gap which can be
sentiment analysis process is completed. The classification
as wide as possible. New examples are then mapped into the
task requires a pre-classified database sample, called training
space taken earlier and predict the category by analyzing the
set, which is used to train and generate a classifier. It also
side of gap they fall on [19].
helps in comparing new unlabeled data to be classified. The
classifier accuracy is highly dependent upon such training data This section explained the framework of the sentiment
[4]. There are different classifiers available to perform the Analysis. The first step that was explained was the data
same and are discussed below, but NaIve Bayes classifier is collection followed by preprocessing that cleans the data. The
the one which is most commonly used for classification of data preprocessed data is then ready for further classification.
in Sentiment Analysis.
Ill. LITERATURE REVIEW
J) Naive Bayes classifier
Sentiment Analysis aims to help the customers in effective
Naive Bayes classifier is a supervised machine learning decision making. The task of manually analyzing the reviews
approach [19]. This supervised classifier was given by Thomas seems to be a difficult one. Thus, Sentiment analysis helps the
Bayes and hence the name. According to this theorem, suppose customers in doing so. Many of the researchers have given
there are two events say, pi and p2then the conditional their significant contribution in the same. In this section, a
probability of occurrence of event pi when p2 has already review of the existing and related works on Sentiment
occurred is given by the following mathematical formula: Analysis has been presented.
P( I )- P(PZIPl)P(Pl)
(1) Keke Cai et al. have presented a research that focuses on
Pl pz - P(pz)
topic detection techniques that is able to detect the topics.
The algorithm of the same calculates the probability of the These topics are highly correlated with the positive and
data to be positive or negative. The formula is as follows: negative opinions rll These techniques help the business
analysts and helps in understanding the overall sentiment
P( I )- p(PBlpA)P(PA)
(2) scope as well as the drivers behind the sentiment. They
PA PB - P(PB)
performed the basic sentiment classification that categorized
Where A = Sentiment, the text into positive, negative or neutral. But the problem they
B=Sentence felt with this type of classification was it lacked insight of
what drives these sentiments. To solve the problem, they came
And, the conditional probability of a word is given by up with a new sentiment analysis technique that not only
determines the sentiment of a given topic, but also determines
P(wordIA) = C+l (3) the root cause of the sentiments. Prashant Raina came up with
D+E
an opinion mining engine that uses common-sense knowledge
C=no. of word occurrence in class
extracted from Concept Net and Semantic Net to perform
D= no of words belonging to a class sentiment analysis in news article [2]. He used a large corpus
of sentences form news article to test the opinion mining
E= total no. of words
engine. The classification accuracy was 71%, with 91%
2) Maximum entropy classifier precision for neutral sentences. Federico Neri et al. has
described a Sentiment study. The study was done on over than
This is another probabilistic classifier which belongs to 1000 Facebook posts. There were posts about newscasts,
the class of exponential models. It is almost similar to Naive comparing the sentiment for Rai, the Italian public
Bayes classifier, however, naIve bayes assumes that the broadcasting service [3]. Ana CES. Lima et al. proposed an
features are conditionally not dependent of each other whereas automatic sentiment classifier for emoticons or sentiment
this algorithm does not take this assumption. This classifier based words containing tweets. They used naive bayes

96
algorithm to classifY the tweets. However, the problem with focuses on the areas covered by the papers that have been
this approach that it classified the tweets as either positive or previously evaluated and has pointed out the areas that are
negative and did not ass neutral to the classification [4].Min already explored by many researchers and areas that are
Wang et al. have emphasized on an approach that helped in neglected in opinion mining and sentiment classification which
realizing polarity analysis of new words and in addition are open for future research opportun ity.
implemented quantative computation of sentiment words and
It can be seen from the existing literature that there exists
automatic expansion of polarity lexicon [5].Their experimental
many algorithms for Sentiment Analysis but with few
results showed feasibility and effectiveness of their approach.
drawbacks and the room for improvement is still there.
ZHU Nanli et al. have presented a study on the recent
development in the field of sentiment analysis. They have IV. COMPARATIVE STUDY
conducted a survey in three major research fields: framework, The section presents a comparative study of various
feature extraction and sentiment analysis. The problem that algorithms that are proposed by various researchers. The
was encountered during this was there has been no research on comparative study is done on the basis of accuracy with which
the commercial value of online reviews [6]. Seyed-Ali each algorithm functions. Accuracy of an algorithm
Bahrainian et al. came up with a novel solution to target determines how correctly a particular algorithm categorizes the
sentiment summarization and SA of short informal texts with text into positive and negative. Extensive studies have been
emphasis on tweets [7]. They have compared different conducted on similar topics by a plethora of researchers. They
algorithms and methods for SA polarity detection and have used a variety of innovative and diverse algorithms and
sentiment summarization. They have compared various PD techniques to yield the necessary results. The researchers in
algorithms. However, detection of sarcasm is yet to be taken their paper have worked on different datasets[21][22]. to find
into account. out the accuracy. The dataset used contained varying number
Andreas Dengel et al. have compared state-of-art of reviews. This study makes an effort to summarize, compare
Sentiment Analysis methods against a novel hybrid method in and analyze the various works of these researchers.
their paper. Their approach trains a linear Support Vector TABLE 1: COMPARATIVE CHART OF ALGORITHMS
Machine (SVM)c1assifier and for that they create a brand new
Paper Approach Accuracy
set of features using Sentiment Lexicon. The problem they 2012[4] Emoticon-based approach 89.56%
faced was the classification did not take sarcasm into account Word based approach 89.92%
[8].Sunil Kumar Khatri et al. have presented a research work Hybrid approach 91.94%
in which they have performed classification on e-data 2013[12] SVM 76.78%
NB 81.07%
collected from multiple sites and then after classification
2013 [7] Hybrid Approach 89.78%
analyzed it with ANN. They reduced the error in prediction up Unsupervised algorithm 81.35%
to least. However, there was a problem with their study. It did Unsupervised-Pd&ranking algorithm 82.12%
not just predict the direction of the market for a particular day, SVM(baseline) 86.70%
but they aimed to take their research to a level where they 2013[13] Lexical based without stemming 74.59%
could predict the closing value for the day [9]. Lexical based with stemming 67.06%
SVM 81.43%
Rui Xia et al. came up with a model called dual sentiment 2014[14] Bag ofWords(BOW) 56%
analysis (DSA). Their paper highlighted the issues with Senti-word Net 65%
2014[15] Semantic analysis(WordNet) 89.9%
sentiment classification [10]. They created a sentiment-
SVM 85.5%
reversed review for each training and test review to perform Maximum entropy 83.8%
their novel data expansion technique. They developed a NB 88.2%
training algorithm that was dual. The algorithms employed 2014[16] SVM&C E-SL 87.20%
both kind of reviews together for learning a sentiment 2015[17] Naive bayes 88.8%
2015[18] Cosine similarity for 2 class sentiment 82.09%
classifier and classified the test reviews using this. They then classification problem
took forward the same from 2-c1ass classification to 3-c1ass Cosine similarity for 3 class sentiment 78.5%
classification. They considered the neutral reviews for the classification problem
same. Finally, a pseudo-antonym dictionary was created that
helped them to perform a corpus-based method. They V. CONCLUSION
conducted a wide range of experiments. The results The large amount of information available online about
demonstrate show effective DSA is. the products and the consumer's high reliability on those has
Vee W.LO et al. have discussed the existing works on led to the urge to perform Sentiment analysis. In this paper, a
opinion mining and sentiment classification performed on comparative study of different Sentiment Analysis algorithms
customer feedback and online reviews, and has evaluated the on the basis of accuracy was presented. Furthermore, the paper
various approaches used for the process [11]. This paper emphasized on the various Sentiment Analysis algorithms and

97
techniques presented by various researchers. In this paper [10] Rui Xia, et al., "Dual Sentiment Analysis: Considering Two Sides of
One Review", Transactions on Knowledge and Data Engineering, Vol.
importance of sentiment analysis has been explained in detail.
27, No.8, Publisher IEEE, 2015, pp. 2121-2133.
This paper also explains the most commonly used classifier [I I] Vee W.LO, Vidyasagar POT DAR., "A review of opinion mining and
algorithm such as naive bayes classifier. Experimental results Sentiment Classification Framework in Social Networks", 3ro lEEE
from other researches show that how different Sentiment International Conference on Digital Ecosystems and technologies, 2009,
pp. 396-40 I.
analysis algorithms behave when appl ied on various set of data
[12] V.K. Singh et al., "Sentiment Analysis of Movie Reviews and Blog
yielding different accuracy. Posts Evaluating SentiWordNet with different Linguistic Features and
Scoring Schemes", 2013 3rd IEEE International Advance Computing
REFERENCES Conference (I ACC), pp. 893-898.
[13] Warih Maharani, "Microblogging Sentiment Analysis with LexicaIBased
[ I] KekeCai, Scott Spangler, Ying Chen, Li Zhang., "Leveraging Sentiment
and Machine Learning Approaches", 2013 IEEE, pp. 439-443.
Analysis for Topic Detection", IEEE/WIC/ACM International
[14] Farhan Hassan Khan et at., "SentiView: A Visual Sentiment Analysis
Conference on Web Intelligence and Intelligent Agent Technology,
Framework", Publisher IEEE, 2014, pp. 291-296.
2008, pp. 265-271.
[15] Geetika Gautam et al., "Sentiment Analysis of Twitter Data Using
[2] Prashant Raina., "Sentiment analysis in News articles using Senti-
Machine Learning Approaches and Semantic Analysis", Publisher
computing",1EEE 13 1h International conference on Data Mining
IEEE,2014.
Workshops, 2013, pp.959-962.
[16] Luiz F. S. Coletta et at., "Combining Classification and Clustering for
[3] Federico Neri, Carlo Aliprandi, Federico Capeci, Monstserrat Cuadros,
Tweet Sentiment Analysis, Publisher IEEE, 2014, pp. 210-215.
Tomas, "Sentiment Analysis on social media", IEEE/ACM International
[17] Alexandra Cernian et al., "Sentiment analysis from product reviews
conference on Advances in Social Networks analysIs and
using SentiWordNt as lexical resource", Publisher IEEE, 2015.
mining,20 12,pp. 919-926.
[18] Saprativa Bhattacharjee et at., "Sentiment Analysis using Cosine
[4] Ana C. E.S.Lima, Leandro N.de Castro., "Automatic Sentiment Analysis
Similarity Measure", 2015 IEEE 2nd International Conference on Recent
oftwitter Messages", Publisher IEEE, 2012, pp. 52-57.
Trends in Information Systems (ReTlS), pp. 27-32.
[5] Min Wang, Hanxio Shi., "Research on Sentiment Analysis Technology
[19] Geetika Gautam et al., "Sentiment Analysis of Twitter data using
and Polarity Computation of Sentiment words", Publisher IEEE, 2010,
Machine Learning Approaches and Semantic Analysis", 20 14,IEEE.
pp.331-334.
[20] Balakrishnan Gokulakrishnan et at., "Opinion Mining and Sentiment
[6] ZHU Nanli, ZOU Ping, Ll Weign, CHENG Meng., "Sentiment Analysis:
Analysis on a Twitter Data Stream", The International Conference on
A Literature Review", in proceedings of the 2012 IEEE ISMOT,
Advances in ICT for Emerging Regions-iCTer 2012,pp.182-188.
pp. 572-576.
[21] John Blitzer et at., "Biographies, Bollywood, Boom-boxes and Blenders:
[7] Seyed-Ali Bahrainian, Andreas Dengel., "Sentiment Analysis and
Domain Adaptation for Sentiment Classification", Association of
Summarization of Twitter Data", IEEE 16 1h International Conference on
Computational Linguistics (ACL), 2007.
Computational Science and Engineering, 20 13,pp. 227-234.
[22] Maas et at.," Learning Word Vectors for Sentiment Analysis",
[8] Seyed-AliBahrainian, Andreas Dengel., "Sentiment Analysis using
Proceedings of the 49th Annual Meeting of the Association for
Sentiment Features", IEEE/WIC/ACM International Conferences on Web
Computational Linguistics: Human Language
Intelligence (WI) and Intelligent Agent Technology(lAT),20I 3,pp. 26-29.
Technologies,20 II ,pp.142-150.
[9] Sunil Kumar Khatri, Himanshu Singhal, Prashant Johri., "Sentiment
analysis to predict Bombay Stock Exchange Using Artificial Neutral
Network, Publisher IEEE,2014.

98

You might also like