Sentiment Analysis of Online Data For Business Analytics: Synopsis
Sentiment Analysis of Online Data For Business Analytics: Synopsis
The system is designed to analyse the customer comments on twitter and other
platforms using convolutional neural network combined with support vector
machine for text sentiment analysis. Instead of going through thousands of
reviews, the proposed model has the ability to polarize the reviews and learn from
it. It also adds up the feature of identifying fake reviews by applying supervised
machine learning algorithms. Location based analysis feature helps in identifying
the interests of people from particular region. The data from several sources such
as twitter comments, amazon product reviews, other hotel reviews are used.
INTRODUCTION:
The social media has redefined the nature of how companies strategize their
business processes. The social media contains a massive volume of unstructured
data (e.g. tweets, comments, blogs, forum discussions, user post, and reviews) that
can be used for business intelligence such as customer profiling and content
analytics. Twitter, which is a social networking online service, is mainly used as a
marketing and promotion tool by most companies. Specifically, twitter data
contains not only user information, but also texts that contain subjective
information (such as user sentiments) towards a particular issue. From a business
perspective, the wealth of tweets is enough for companies to gather sufficient
feedback about their products and services from their customers without having to
spend for costly customer surveys and interviews. On the other hand, analyzing
and extracting information from unstructured data poses a formidable challenge to
data miners.Humans can easily find patterns and trends in documents but this
ability is limited when a large amount of data is involved.
METHODOLOGY:
1,connect to live media (twitter,facebook,yelp,amazon)data stream, extract using API and store
data on hadoop
2,Process data in hadoop; Restructure, filter and provide useul insights from it
3,Create tables in hadoop
4,Create an attractve user friendly interface to end users for querying
5,Do sentiment analysis by comparing sentiments of public about a subject
6,Provide location based classification and analysis
7,Identify fake reviews using supervised machine learning algorithms
8,Provide visualization of analytics
9,Create pie chart,percentage calculation,bar diagram, word cloud and histogram
OVERVIEW
Tweets are imported using R and the data is cleaned by removing emoticons and
URLs. Lexical Analysis as well as Naive Bayes Classifier is used to predict the
sentiment of tweets and subsequently express the opinion graphically through
ggplots, histogram, pie chart, wordcloud and tables. The front end has been created
using the Shiny App.
FEATURES
1. Extraction of data
(i) Create twitter application
(ii) twitteR - Provides an interface to the Twitter web API
(iii) ROAuth - R Interface For OAuth
(iv) Create twitter authenticated credential object(using key from step (ii) and
cacert.pem certificate): It is done using consumer key, consumer secret, access
token, access secret.
(v) During authentication, we are redirected to a URL automatically where we
click on Authorize app as shown in the image below and enter the unique
7-digit number to get linked to the account from which feeds are being taken.
2. Cleaning data
The tweets are cleaned in R by removing:
● Extra punctuation
● Stop words (Most commonly used words in a language like the, is, at,
which, and on.)
● Redundant Blank spaces
● Emoticons
● URLS
4. Algorithms used
● Lexical Analysis: By comparing uni-grams to the pre-loaded word
database, the tweet is assigned sentiment score - positive, negative or
neutral and overall score is calculated.
● Naive Bayes Machine Learning Algorithm: Training data sets are
used to teach the machine what kind of sentences are categorized as
positive and what kind are categorized as negative. On arrival of a new
tweet or sentence, the machine uses this algorithm to give the correct
category to the new data and adds level to the emotion.
7. Results representation
In the table tab of our Shiny Web app as shown below, we have presented the
scores, the tweets as well as the percentage of positive/negative emotion in the text.
Th is calculated using simple arithmetic to understand the overall sentiment in a
more better manner.
SYSTEM REQUIREMENTS
● Installation of R
● Installation of R Studio
● Installation of HADOOP HDFS
● Media Authentication to access API(Twitter,amazon etc)
Software Requirement
• Operating System : Windows XP or above
• Web Server : Apache
• IDE : Netbeans
• Language :R
• Interface framework : Shiny
• Storage : Hadoop (HDFS)
Hardware Requirements
• Processor : Intel core i5
• RAM : 4GB