0% found this document useful (0 votes)
30 views

PPPT

The document presents a proposed study on sentiment analysis of tweets using machine learning algorithms like Naive Bayes and Support Vector Machines. It discusses collecting Twitter data, preprocessing the data through steps like tokenization and filtering, extracting features, and classifying sentiment using Naive Bayes and SVMs. The methodology involves training models on labeled tweet data and evaluating accuracy on unlabeled test data.

Uploaded by

Akshita Khanna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

PPPT

The document presents a proposed study on sentiment analysis of tweets using machine learning algorithms like Naive Bayes and Support Vector Machines. It discusses collecting Twitter data, preprocessing the data through steps like tokenization and filtering, extracting features, and classifying sentiment using Naive Bayes and SVMs. The methodology involves training models on labeled tweet data and evaluating accuracy on unlabeled test data.

Uploaded by

Akshita Khanna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Presentation On

Sentiment Analysis Of Twitter

B.Tech (IT) – VI Semester


Session : DEC-APR(2019)

Submitted To- Submitted By-


Department Of Mathematics and Akshita Khanna(11007)
Computing Anupriya(11014)
Nischal Mehta(11061)
Pratibha Sharma(11067)
Content
1. Introduction
2. Proposed Study
a. Data Collection
b. Pre-Processing
c. Feature Selection
d. Feature Extraction
e. Classification
3. Research Methodology
4. Conclusion
Problem Statement
 A major benefit of social media is that we can see the good
and bad things people say about the particular brand or
personality.
 The bigger your company gets difficult it becomes to keep
a handle on how everyone feels about your brand. For
large companies with thousands of daily mentions on social
media, news sites and blogs, it’s extremely difficult to do
this manually.
 To combat this problem, sentimental analysis software are
necessary. These soft wares can be used to evaluate the
people's sentiment about particular brand or personality.
Introduction: What is Tweezer!!

 TWEEZER = TWEEts + analyZER

 This product (Tweezer) introduce a novel approach for


automatically classifying the sentiment of Twitter
messages. These messages are classified as positive or
neutral or negative with respect to a query term or the
keyword entered by a user.
Data Collection

1. Data Streaming: For performing sentimental analysis


we need Twitter data consisting of tweets about a
particular keyword or query term. For collecting the
data and tweets we have used Twitter public API
available for general public for free. It is the part of
Data Collection.

#NOTE: Tweets are short messages, restricted to 140


characters in length. Due to the nature of this micro
blogging service (quick and short messages), people use
acronyms, make spelling mistakes, use emoticons and
other characters that express special meanings.
Data Pre Processing

Tokenization

Filtering
Tokens

Stemming

Conversion

Pre-Processing
Data Pre Processing(cont..)

1.Tokenization: This process splits the text of a document


into a sequence of tokens. The splitting points are defined
using all non letter characters. This results in tokens
consisting of one single word (unigrams).

2.Filtering Tokens: Length based filtration scheme was


applied for reducing the generated token set.
Data Pre Processing(cont..)

Removing URLs ,hashtags ,references ,special characters -First step is the


cleaning the data of hash tags, numbers (1, 2, 3 etc.,), URL‘s and targets (@) which will
help to reduce most of the noise.Also, the non word symbols such as a full stop,
comma, inverted commas etc are removed.

For eg.- Lets us consider a tweet-


A great win and a fabulous innings by @imVkohli. Yet another at his adopted home
ground . Excellent role played by @msdhoni and @DineshKarthik to take India over
the line. #INDvAUS https://t.co/7n3M2l3hZS.
This will change to:
A great win and a Fabulous Innings by Yet another at his adopted home ground
Excellent role played by to take India over the line
Removal of stop words- A list of stop words like for, she, he, is ,of ,the etc. are created
and ignored.

The above example will now change to:

great win Fabulous Innings another adopted home ground Excellent role played take
India over the line
Feature Extraction
 Selection of useful words from the tweets from the preprocessed data set is
called as feature extraction . In the feature extraction method, we extract the
aspects from the pre-processed twitter dataset.
 There are different ways of feature extraction – unigram , bigram and n-gram.
For eg:she is not bad.
 If the word ‘bad’ occurs , the sentiment is not necessarily negative. If we
consider 2-gram , the feature ‘not bad’ also has to be taken into account i.e this
statement is most likely to be a positive statement. Therefore, n-grams used as
features in classification can improve the result.
 Parts Of Speech Tags like adjectives, adverbs, verbs and nouns are good
indicators of subjectivity and sentiment which specifies the polarity of the
tweet.
 Negation is very important and a difficult feature to interpret. The presence of
a negation in tweet changes the polarity of the sentiment.
Naïve Bayes
 Naïve Bayes is a machine learning based probabilistic approach for
sentiment analysis. It is based on Bayes Theorem with an
assumption of independence among predictors. .

 In Sentiment analysis of tweets, it classifies the tweets into two


classes(positive/negative) using the frequently used
positive/negative words in the traning dataset as feature.

 Well known application of Naïve Bayes are Categorization of


News, Email Spam Detection etc.
Bayes Theorem
According to the Bayes theorem,

P(A|B) = P(B|A) *P(B) / P(A)


Where,
 P(A|B) is the posterior probability of class (A, target) given predictor
(B, attributes).
 P(A) is the prior probability of class.
 P(B|A) is the likelihood which is the probability of predictor given
class.
 P(B) is the prior probability of predictor.

The Naïve Bayes is an extension to the Bayes Theorem because in case of


naïve bayes we can have multiple classes(C1,C2...Cn) and multiple features
(X1,X2,…Xn) whereas in bayes theorem we have only two.
Example-Naïve Bayes
Ex :whether the players will play the game or not
depending on the weather condition.
Weather Play Frequency Table
Weather No Yes
Sunny No Overcast 4
Overcast Yes Rainy 3 2
Rainy Yes Sunny 2 3
Sunny Yes Grand Total 5 9
Sunny Yes Likelihood Table
Overcast Yes Weather No Yes
Rainy No Overcast 4 =4/14 0.29
Rainy No Rainy 3 2 =5/14 0.36
Sunny Yes Sunny 2 3 =5/14 0.36
Rainy Yes All 5 9
Sunny No
=5/14 =9/14
Overcast Yes
Overcast Yes 0.36 0.64

Rainy No
if we want to calculate the probability of the total number of yes when it is a sunny day, then
=>P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny) .
Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) =5/14 = 0.36, P( Yes)= 9/14 = 0.64
Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.
In our case , we have train and test data .Train data is used for generating features to
train the algorithm.Assuming,we have n tweets from which k are positives and (n-k)
are negative.
we are classifying the tweets into two classes (positive /negative) .
Example: @xyz when you are happy , you look beautiful !!!
@xyz I am sad .
A=P(positive | tweet) P(tweet | positive)*P(positive)/P(tweet)
B=P(negative | tweet)=P(tweet | positive)*P(positive)/P(tweet)
P(positive | tweet)=P(happy| positive)*P(beautiful |positive) *P(positive)
Similarly,
P(negative | tweet)=P(sad | negative) *P(negative)
Features Positive Negative
beautiful 4 1
if(A>B) then the Tweet is positive otherwise,
the tweet is negative. sad 2 5
happy 3 0
total 9 6
SUPPORT VECTOR
MACHINE.
 1.“Support Vector Machine”(SVM) is a Supervised Machine Learning
Algorithm. It is used for both classification and regression.

 2. SVM is a supervised learning method that sorts data into two categories.

 3.The task of an SVM algorithm is to determine which category a new data


point belongs in.

 4.In other words, given labeled training data (supervised learning), the
algorithm outputs an optimal hyperplane which categorizes new examples.
HOW DOES IT WORK?

SCENARIO ONE
Here we have three hyperplanes(A,B and C).We need to identify the
right hyperplane to classify star and cirlce.
HOW DOES IT WORK?
SCENARIO TWO
Here we have three hyperplanes(A,B and C) and all are
segregating the classes well. So How can we identify the
right Hyperplane?
Sentiment Analysis using SVM
 Sentiment Analysis is treated as classification task , as it
 classifies orientation of text into positive and negative.

 The goal of SVM is to separate negative and positive


training example by finding n-1 hyperplanes.
 The confusion matrix is obtained after implementing
the SVM algorithm.

POSITIVE NEGATIVE
POSITIVE 3000 0

NEGATIVE 900 0
CALCULATING
ACCURACY
Accuracy can be calculated as number of correctly
predicted reviews to the number of total number of
reviews present in the corpus. The formula for
calculating accuracy is given as:

 The task of an SVM algorithm is to determine which


category a new data point belongs in.

 In other words, given labeled training data (supervised


learning), the algorithm outputs an optimal hyperplane
Accuracy of algorithms

We are also providing a comparative study of the different algorithms


that we have discussed on the basis of the confusion matrix.
Positive Negative

Positive True Positive(TP) False Positive(FP)

Negative True Negative(TN) False Negative(FN)

Confusion Matrix

accuracy can be calculated using the confusion matrix -


Accuracy = (TP+TN)/(TP+TN+FN+FP)

 TP=TRUE POSITIVE
 TN=TRUE NEGATIVE
 FP=FALSE POSITIVE
 FN=FALSE NEGATIVE
A study by Twitter in 2015 shows that 15% of tweets during TV prime time contain
at least one emoji that’s why they are a major factor of consideration. The polarity of
an emoticon is based on the score they are carrying. The polarity of a tweet is the
sum of the polarity of the textual part and the emoticon part. Following is the list of
some of the emoticons along with their scores:-

As we can see that the scores of the negative emoticons are already
negative so when they are added to the polarity of the textual part of the
tweet, the polarity of the tweet changes accordingly.

You might also like