Natural language processing-Section (4)
Natural language processing-Section (4)
2
Defining The Lists for Positive and
Negative Words
First, we need to define a list of positive and
negative words to compare words from our text
against.
positive_words = ["well", "good", "great", "like", "better", "enough", "happy", "love", "pleasure", "hap
piness"]
negative_words = ["miss", "poor", "doubt", "object", "sorry", "impossible", "afraid", "scarcely", "bad",
"anxious"]
3
Opening and Reading The File
4
Tokenizing The Text
words = word_tokenize(text)
5
Checking if The Text contains
Positive or Negative Words
6
Improving The Program
7
Keeping Positive and Negative
Scores
8
Keeping Positive and Negative Scores
(Cont.)
positive_score = 0
negative_score = 0
9
Improving The Program even
Further
10
Using Word Similarity
11
Stop Words
13
Removing Stop Words
stop_words= stopwords.words("english")
filtered_words = []
14
Using Word Similarity
What we’ll do is that for each word in our text, we’ll check how similar
it is to each word in the positive list and then again for the negative list.
We’ll keep each similarity score in a list, and get the maximum
score.
positive_score = 0
negative_score = 0
positive_similarity = []
negative_similarity = []
15
Using Word Similarity (Cont.)
positive_score += max(positive_similarity)
negative_score += max(negative_similarity)
16
Fixing Problems in Our Code
17
Fixing Problems in Our Code (Cont.)
positive_score += max(positive_similarity)
negative_score += max(negative_similarity)
18
Fixing Problems in Our Code (Cont.)
19
Fixing Problems in Our Code (Cont.)
positive_score += max(positive_similarity) 20
negative_score += max(negative_similarity)
Checking Our Results
21
Wu-Palmer Similarity
22
Code #1: Introducing Synsets
23
Code #2: Wu Similarity
syn1.wup_similarity(syn2)
Output :
0.26666666666666666
hello and selling is apparently 27% similar!
24
Try it out yourself
Code:
https://colab.research.google.com/drive/19sLiFnHyDzi1
M99yRjeHlB7ekCSYXrmD
25
Task #1
Use the mini project we did to loop through all text files
in a directory and print the document name and
whether it contains positive or negative text.
Extra: See if you can improve the mini project even
further.
26
Task #2
27
Thank you for your attention!
28
References
https://www.tidytextmining.com/sentiment.html
29