0% found this document useful (0 votes)
37 views

Abstract

The document discusses different methods for sentiment analysis, including machine learning approaches like support vector machines, naive bayes, and maximum entropy classification. It also covers lexicon-based, rule-based, multilingual, and feature-driven sentiment analysis techniques.

Uploaded by

Shamsul Bashar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Abstract

The document discusses different methods for sentiment analysis, including machine learning approaches like support vector machines, naive bayes, and maximum entropy classification. It also covers lexicon-based, rule-based, multilingual, and feature-driven sentiment analysis techniques.

Uploaded by

Shamsul Bashar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Abstract

Sentiment analysis (SA) is an intellectual process of extricating user’s feelings and emotions. It
is one of the pursued fields of Natural Language Processing (NLP). The evolution of Internet
based applications has steered massive amount of personalized reviews for various related
information on the Web. These reviews exist in different forms like social Medias, blogs, Wiki
or forum websites. Both travelers and customers find the information in these reviews to be
beneficial for their understanding and planning processes. The boom of search engines like
Yahoo and Google has flooded users with copious amount of relevant reviews about specific
destinations, which is still beyond human comprehension. Sentiment Analysis poses as a
powerful tool for users to extract the needful information, as well as to aggregate the collective
sentiments of the reviews. Several methods have come to the limelight in recent years for
accomplishing this task. In this paper we compare the various techniques used for Sentiment
Analysis by analyzing various methodologies.

Introduction
Sentiment analysis is a kind of text classification that catalogs texts based on the sentiment
orientation of opinions they contain. It thus plays an important part of Natural Language
Processing. NLP is a field of computer science and artificial intelligence that mainly deals with
human-computer language interaction. This field is particularly of use to merchants, stock
traders, and in election works.
Sentiment analysis is the process of detecting the contextual polarity of the text. It determines
whether given text is positive, negative or neutral. It is otherwise called as opinion mining too,
since it derives the opinion or attitude of the speaker. For this analysis, the opinions are collected
from the users, which can be employed for further improvements. The social networks act as a
medium where the users can post many opinions a day and these blogs are used for
classification. A lot of research work is being held in the field of sentiment analysis due to its
significance in the marketing level competition and the changing needs of the people. Sentiment
analysis requires the usage of a training set for its performance, and its quality plays a great role
in the accurate evaluation of the text.
The semantic analysis of the sentence also increases the meaning and accuracy of the result. POS
tagging will be helpful to users for understanding whether the review or comment corresponds to
the relevant subject searched for.

Levels of analysis
In general, sentiment analysis has been investigated mainly at three levels [1]. In document level
the main task is to classify whether a whole opinion document expresses a positive or negative
sentiment. This level of analysis assumes that each document expresses opinions on a single
entity. In sentence level the main task is to check whether each sentence expressed a positive,
negative, or neutral opinion. This level of analysis is closely related to subjectivity classification,
which distinguishes objective sentences that express factual information from subjective
sentences that express subjective views and opinion. Document level and the sentence level
analyses do not discover what exactly people liked and did not like. Aspect level performs finer-
grained analysis. Instead of looking at language constructs (documents, paragraphs, sentences,
clauses or phrases), aspect level directly looks at the opinion itself.

Sentiment analysis Methods


Sentiment analysis played a great role in the area of researches done by many, there are many
methods to carry out sentiment analysis. Still many researches are going on to find out better
alternatives due to its importance in this scenario. Some of the methods are discussed in this
paper.

Machine learning approach


Machine learning strategies work by training an algorithm with a training data set before
applying it to the actual data set. Machine learning techniques first trains the algorithm with
some particular inputs with known outputs so that later it can work with new unknown data [2].
Some of the most renowned works based on machine learning are as follows:

Support Vector Machine


It is a non-probabilistic classifier in which a large amount of training set is required. It is done by
classifying points using a (d-1)-dimensional hyper plane. SVM finds a hyper plane with largest
possible margin [3]. Support
Vector Machines make use of the concept of decision planes that define decision boundaries. A
decision plane is one that separates between a set of objects having different class membership.
An illustration is given in Fig. 1a. In this the objects belong to either class red or green, and the
separating line defines the boundary. Here the original objects are (left side of Fig. 1b) mapped
or rearranged using a mathematical function known as kernel and this is known as mapping or
transformation. After transformation, the mapped objects are linearly separable and as a result
the complex structures having curves to separate the objects can be avoided.

Naïve Bayes Method


It is a probabilistic classifier and is mainly used when the size of the training set is less. In
machine learning it is in family of sample probabilistic classifier based on Bayes theorem. The
conditional probability that an event X occurs given the evidence Y is determined by Bayes rule
by the (1).
Maximum Entropy Classifier
A Maximum Entropy (ME) classifier, or conditional exponential classifier, is parameterized by a
set of weights that are used to combine the joint-features that are generated from a set of features
by an encoding. The encoding maps each pair of feature set and label to a vector. ME classifiers
belong to the set of classifiers known as the exponential or log-linear classifiers, because they
work by extracting some set of features from the input, combining them linearly and then using
this sum as exponent. If this method is done in an unsupervised manner, then Point wise Mutual
Information (PMI) is made use in order to find the co-occurrence of a word with positive and
negative words. The ME Classifier is one of the models which do not assume the independent
features [7]. The uncertainty is maximum for a uniform distribution. The measure of uncertainty
is known as entropy. So model in this paper should be uniform as possible, still obeying the
constraints that are imposed.

K-NN
K-Nearest Neighbour method is based on the fact that the classification of an instance will be
somewhat similar to those nearby it in the vector space. Further some group researched on
weighted k-Nearest Neighbour method, in which they provided weightage to those elements in
the training set and they used these weights for their calculation of sentiment of text in word by
word manner [8]. Here the score is calculated by using the (4).
Positivity Score = (1Σj score (pos) + 1Σk score (neg))/ 1Σs maximum score (4)
Here s=j+k, ie. Count of both positive and negative together. In weighted k-NN method they first
of all tokenize the sentences and removed the stop words from the tweets they have fetched. The
algorithm proposed by the authors of [8] is carried out in two parses. A positive score is assigned
to each review after the first parse. This is passed for second parsing and an input of neutral
review is given. Using this score is modified if required. It is done for better positivist
determination and an output file consisting of review ID and its positive score is determined.

Multilingual Sentiment Analysis


Now a days customers are having options to express their views in various language of choice, to
yield better result researcher should consider the posts in different language. It is elaborated in
[9], which explained a method, within multilingual framework to carry out the task of
determining the polarity of the text. It is done using several Natural Language Tool Kits. In this
language is an identified first using language model. After identification, the language is
translated to English using standard translation software. In [9] they are making use of PROMT
eXcellent Translation (XT) Technology, for the purpose of translation. After that they are going
on to the process of sentiment classification [10].
Feature Driven Sentiment Analysis
The product feature extraction plays a key role in the evaluation of the products, since we can see
the importance of the knowledge of the features and their relationships for the enhanced
marketing plan. In [11], it is done by Fuzzy
Domain Ontology Sentiment Tree (FDOST). In FDOST, the root node represents the product,
the leaf nodes represent the polarity and the non-leaf nodes represent the sub features of
corresponding parent features.

Rule Based Approach


Rule based approach is used by defining various rules for getting the opinion, created by
tokenizing each sentence in every document and then testing each token, or word, for its
presence. If the word is there and has with a positive sentiment, a +1 rating was applied to it.
Each post starts with a neutral score of zero, and was considered positive. If
the final polarity score was greater than zero, or negative if the overall score was less than zero
[12] After the output of rule based approach it will check or ask whether the output is correct or
not. If the input sentence contains any word which is not present in the database which may help
in the analysis of movie review, then such words are to be added to the database. This is
supervised learning in which the system is trained to learn if any new input is given.

Lexical Based Approach


Lexicon Based techniques work on an assumption that the collective polarity of a sentence or
documents is the sum of polarities of the individual phrases or words. In the seminar ROMIP
2012 the lexicon based method proposed in [14] was used. This method is based on emotional
research for sentiment analysis dictionaries for each domain. Next, each domain dictionary was
replenished with appraisal words of appropriate training collection that have the highest weight,
calculated by the method of RF (Relevance Frequency) [15]. The word-modifier changes
(increases or decreases) the weight of the following appraisal word by a certain percentage.
Word-negation shifts the weight of the following appraisal word by a certain offset: for positive
words to decrease, for negative to increase.
The procedure of the text sentiment classification was carried out as follows. First weights of all
training texts the classified text is calculated. All the texts are placed into a one dimensional
emotional space. The proportion of deletions was determined by the cross-validation method.
Then the average weights of training texts for each sentiment class were found. The classified
text was referred to the class which was located closer in the one-dimensional emotional space.

Comparison and Consolidation


The comparison and consolidation of the three main approaches used in sentiment analysis is
shown in Table 1.
Performing sentiment analysis by various approaches will produce different results. Each
approach has its own pros and cons. By considering the key factors like performance, efficiency,
and accuracy, the machine learning approach yields the best result and most of the work has been
done in this approach. Several methods are evolved for doing this task which is described in
Table 2.

Conclusion
Various sentiment analysis methods and its different levels of analysing sentiments have been
studied in this paper. Our ultimate aim is to come up with Sentiment Analysis which will
efficiently categorize various reviews.
Machine learning methods like SVM, NB, Maximum Entropy methods were discussed here in
brief, along with some other interesting methods that can improve the analysis process in one or
the other way. Semantic analysis of the text is of great consideration. Research work is carried
out for better analysis methods in this area, including the semantics by considering n-gram
evaluation instead of word by word analysis. We have also come across some other methods like
rule based and lexicon based methods. In the world of Internet majority of people depend on
social networking sites to get their valued information, analyzing the reviews from these blogs
will yield a better understanding and help in their decision-making.

You might also like