Abstract
Abstract
Sentiment analysis (SA) is an intellectual process of extricating user’s feelings and emotions. It
is one of the pursued fields of Natural Language Processing (NLP). The evolution of Internet
based applications has steered massive amount of personalized reviews for various related
information on the Web. These reviews exist in different forms like social Medias, blogs, Wiki
or forum websites. Both travelers and customers find the information in these reviews to be
beneficial for their understanding and planning processes. The boom of search engines like
Yahoo and Google has flooded users with copious amount of relevant reviews about specific
destinations, which is still beyond human comprehension. Sentiment Analysis poses as a
powerful tool for users to extract the needful information, as well as to aggregate the collective
sentiments of the reviews. Several methods have come to the limelight in recent years for
accomplishing this task. In this paper we compare the various techniques used for Sentiment
Analysis by analyzing various methodologies.
Introduction
Sentiment analysis is a kind of text classification that catalogs texts based on the sentiment
orientation of opinions they contain. It thus plays an important part of Natural Language
Processing. NLP is a field of computer science and artificial intelligence that mainly deals with
human-computer language interaction. This field is particularly of use to merchants, stock
traders, and in election works.
Sentiment analysis is the process of detecting the contextual polarity of the text. It determines
whether given text is positive, negative or neutral. It is otherwise called as opinion mining too,
since it derives the opinion or attitude of the speaker. For this analysis, the opinions are collected
from the users, which can be employed for further improvements. The social networks act as a
medium where the users can post many opinions a day and these blogs are used for
classification. A lot of research work is being held in the field of sentiment analysis due to its
significance in the marketing level competition and the changing needs of the people. Sentiment
analysis requires the usage of a training set for its performance, and its quality plays a great role
in the accurate evaluation of the text.
The semantic analysis of the sentence also increases the meaning and accuracy of the result. POS
tagging will be helpful to users for understanding whether the review or comment corresponds to
the relevant subject searched for.
Levels of analysis
In general, sentiment analysis has been investigated mainly at three levels [1]. In document level
the main task is to classify whether a whole opinion document expresses a positive or negative
sentiment. This level of analysis assumes that each document expresses opinions on a single
entity. In sentence level the main task is to check whether each sentence expressed a positive,
negative, or neutral opinion. This level of analysis is closely related to subjectivity classification,
which distinguishes objective sentences that express factual information from subjective
sentences that express subjective views and opinion. Document level and the sentence level
analyses do not discover what exactly people liked and did not like. Aspect level performs finer-
grained analysis. Instead of looking at language constructs (documents, paragraphs, sentences,
clauses or phrases), aspect level directly looks at the opinion itself.
K-NN
K-Nearest Neighbour method is based on the fact that the classification of an instance will be
somewhat similar to those nearby it in the vector space. Further some group researched on
weighted k-Nearest Neighbour method, in which they provided weightage to those elements in
the training set and they used these weights for their calculation of sentiment of text in word by
word manner [8]. Here the score is calculated by using the (4).
Positivity Score = (1Σj score (pos) + 1Σk score (neg))/ 1Σs maximum score (4)
Here s=j+k, ie. Count of both positive and negative together. In weighted k-NN method they first
of all tokenize the sentences and removed the stop words from the tweets they have fetched. The
algorithm proposed by the authors of [8] is carried out in two parses. A positive score is assigned
to each review after the first parse. This is passed for second parsing and an input of neutral
review is given. Using this score is modified if required. It is done for better positivist
determination and an output file consisting of review ID and its positive score is determined.
Conclusion
Various sentiment analysis methods and its different levels of analysing sentiments have been
studied in this paper. Our ultimate aim is to come up with Sentiment Analysis which will
efficiently categorize various reviews.
Machine learning methods like SVM, NB, Maximum Entropy methods were discussed here in
brief, along with some other interesting methods that can improve the analysis process in one or
the other way. Semantic analysis of the text is of great consideration. Research work is carried
out for better analysis methods in this area, including the semantics by considering n-gram
evaluation instead of word by word analysis. We have also come across some other methods like
rule based and lexicon based methods. In the world of Internet majority of people depend on
social networking sites to get their valued information, analyzing the reviews from these blogs
will yield a better understanding and help in their decision-making.