0% found this document useful (0 votes)
50 views6 pages

Sentiment Analysis of Online Data For Business Analytics: Synopsis

The document proposes a system for sentiment analysis of online data from sources like Twitter, Amazon reviews, and other social media platforms. It involves collecting and cleaning the data, performing sentiment analysis using techniques like convolutional neural networks and support vector machines, identifying fake reviews, and providing visualizations and insights. The system is designed to be an all-in-one analytics tool with features like location-based analysis and an attractive user interface.

Uploaded by

sachin mohan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views6 pages

Sentiment Analysis of Online Data For Business Analytics: Synopsis

The document proposes a system for sentiment analysis of online data from sources like Twitter, Amazon reviews, and other social media platforms. It involves collecting and cleaning the data, performing sentiment analysis using techniques like convolutional neural networks and support vector machines, identifying fake reviews, and providing visualizations and insights. The system is designed to be an all-in-one analytics tool with features like location-based analysis and an attractive user interface.

Uploaded by

sachin mohan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

SYNOPSIS

Sentiment analysis of Online Data for


Business Analytics
ABSTRACT

In the recent years, social networks in business are gaining unprecedented


popularity because of their potential for business growth. Companies can know
more about consumers’ sentiments towards their products and services, and use it
to better understand the market and improve their brand. Thus, companies
regularly reinvent their marketing strategies and campaigns to fit consumers’
preferences. Social analysis harnesses and utilizes the vast volume of data in social
networks to mine critical data for strategic decision making. It uses machine
learning techniques and tools in determining patterns and trends to gain actionable
insights.

The system is designed to analyse the customer comments on twitter and other
platforms using convolutional neural network combined with support vector
machine for text sentiment analysis. Instead of going through thousands of
reviews, the proposed model has the ability to polarize the reviews and learn from
it. It also adds up the feature of identifying fake reviews by applying supervised
machine learning algorithms. Location based analysis feature helps in identifying
the interests of people from particular region. The data from several sources such
as twitter comments, amazon product reviews, other hotel reviews are used.
INTRODUCTION:

The social media has redefined the nature of how companies strategize their
business processes. The social media contains a massive volume of unstructured
data (e.g. tweets, comments, blogs, forum discussions, user post, and reviews) that
can be used for business intelligence such as customer profiling and content
analytics. Twitter, which is a social networking online service, is mainly used as a
marketing and promotion tool by most companies. Specifically, twitter data
contains not only user information, but also texts that contain subjective
information (such as user sentiments) towards a particular issue. From a business
perspective, the wealth of tweets is enough for companies to gather sufficient
feedback about their products and services from their customers without having to
spend for costly customer surveys and interviews. On the other hand, analyzing
and extracting information from unstructured data poses a formidable challenge to
data miners.Humans can easily find patterns and trends in documents but this
ability is limited when a large amount of data is involved.

The system make use of sentiment analysis in business applications. Furthermore,


this paper demonstrates the text analysis process in reviewing the public opinion of
customers towards a certain brand and presents hidden knowledge (e.g. customer
and business insights) that can be used for decision making after the text analysis is
performed. More so, stressed that there is limited academic literature surrounding
text analytics of Twitter data, as a result, this paper attempts to contribute in this
developing field by providing a practical guide on how to mine and analyse
customers’ tweets.

METHODOLOGY:
1,connect to live media (twitter,facebook,yelp,amazon)data stream, extract using API and store
data on hadoop
2,Process data in hadoop; Restructure, filter and provide useul insights from it
3,Create tables in hadoop
4,Create an attractve user friendly interface to end users for querying
5,Do sentiment analysis by comparing sentiments of public about a subject
6,Provide location based classification and analysis
7,Identify fake reviews using supervised machine learning algorithms
8,Provide visualization of analytics
9,Create pie chart,percentage calculation,bar diagram, word cloud and histogram

Existing System Proposed System


Focus on twitter data Focus on twitter, amazon, yelp and other
online sources
No facility for fake review identification Integrated with fake review identification
using supervised machine learning
algorithms
Location based features available on Integrated location based review analysis
independent models
No user friendly interface Attractive user friendly interface
Not an all in one analysis system Enhancing to be an all in one analysis system
SVM not used SVM is used

OVERVIEW
Tweets are imported using R and the data is cleaned by removing emoticons and
URLs. Lexical Analysis as well as Naive Bayes Classifier is used to predict the
sentiment of tweets and subsequently express the opinion graphically through
ggplots, histogram, pie chart, wordcloud and tables. The front end has been created
using the Shiny App.

FEATURES
1. Extraction of data
(i) Create twitter application
(ii) twitteR - Provides an interface to the Twitter web API
(iii) ROAuth - R Interface For OAuth
(iv) Create twitter authenticated credential object(using key from step (ii) and
cacert.pem certificate): It is done using consumer key, consumer secret, access
token, access secret.
(v) During authentication, we are redirected to a URL automatically where we
click on Authorize app as shown in the image below and enter the unique
7-digit number to get linked to the account from which feeds are being taken.

2. Cleaning data
The tweets are cleaned in R by removing:
● Extra punctuation
● Stop words (Most commonly used words in a language like the, is, at,
which, and on.)
● Redundant Blank spaces
● Emoticons
● URLS

3. Loading Word Database


A database, created by Hui Lui containing positive and negative words, is
loaded into R. This is used for Lexical Analysis, where the words in the tweets are
compared with the words in the database and the sentiment is predicted.
For movie tweets, Naive Bayes Machine Learning Algorithm is used. AFINN is
a list of English words rated for valence with an integer between minus five
(negative) and plus five (positive). The words have been manually labeled by Finn
Årup Nielsen in 2009-2011. The file is tab-separated. The version used is:
AFINN-111: Newest version with 2477 words and phrases.

4. Algorithms used
● Lexical Analysis: By comparing uni-grams to the pre-loaded word
database, the tweet is assigned sentiment score - positive, negative or
neutral and overall score is calculated.
● Naive Bayes Machine Learning Algorithm: Training data sets are
used to teach the machine what kind of sentences are categorized as
positive and what kind are categorized as negative. On arrival of a new
tweet or sentence, the machine uses this algorithm to give the correct
category to the new data and adds level to the emotion.

5. Classification based on location and type of data


The data collected can be classified based location of users . Thus analysis can be made
based on continent, country, state or particular area of users.

6. Fake review identification


In the table tab of our Shiny Web app as shown below, we have presented the
scores, the tweets as well as the percentage of positive/negative emotion in the text.
Th is calculated using simple arithmetic to understand the overall sentiment in a
more better manner.

7. Results representation
In the table tab of our Shiny Web app as shown below, we have presented the
scores, the tweets as well as the percentage of positive/negative emotion in the text.
Th is calculated using simple arithmetic to understand the overall sentiment in a
more better manner.

SYSTEM REQUIREMENTS
● Installation of R
● Installation of R Studio
● Installation of HADOOP HDFS
● Media Authentication to access API(Twitter,amazon etc)

Software Requirement
• Operating System : Windows XP or above
• Web Server : Apache
• IDE : Netbeans
• Language :R
• Interface framework : Shiny
• Storage : Hadoop (HDFS)

Hardware Requirements
• Processor : Intel core i5

• Speed : 2.1 GHz

• RAM : 4GB

You might also like