0% found this document useful (0 votes)

8 views29 pages

Natural language processing-Section (4)

The document outlines a mini project to create a Python program that analyzes text sentiment by comparing words against predefined lists of positive and negative words. It details the steps for reading a file, tokenizing text, and calculating sentiment scores, while addressing potential issues with accuracy and word similarity using the Wu-Palmer similarity method. Additionally, it suggests tasks for further improvement and exploration of stopwords in Arabic.

Uploaded by

dw9324764

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views29 pages

Natural language processing-Section (4)

Uploaded by

dw9324764

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

Language Engineering

Prepared by: Abdelrahman M. Safwat

Section (4) – Mini Project

Idea

 We want to create a Python program that takes a file

and checks whether the text indicates a positive
sentiment or negative sentiment.

2
Defining The Lists for Positive and
Negative Words
 First, we need to define a list of positive and
negative words to compare words from our text
against.
positive_words = ["well", "good", "great", "like", "better", "enough", "happy", "love", "pleasure", "hap
piness"]

negative_words = ["miss", "poor", "doubt", "object", "sorry", "impossible", "afraid", "scarcely", "bad",
"anxious"]

3
Opening and Reading The File

 Next, we need to open the text file and store its

content in a variable. Make sure the text file is in the
same directory as the Python script.
file = open(“1.txt")
text = file.read()

4
Tokenizing The Text

 Next, we need to tokenize all the text into words.

from nltk.tokenize import word_tokenize

words = word_tokenize(text)

5
Checking if The Text contains
Positive or Negative Words

 Next, we need to loop through all words and check if

they’re included in either of the lists we created.
for word in words:
if word in positive_words:
print("The text is positive")
break
elif word in negative_words:
print("The text is negative")
break

6
Improving The Program

 You’ll notice that there’s a problem with the code in the

previous slide, and that if there are a mix of both
good and bad words, this method would be
inaccurate.

7
Keeping Positive and Negative
Scores

 To solve the problem we mentioned, we can keep score

of how many positive and negative words there
are in the text.

8
Keeping Positive and Negative Scores
(Cont.)

positive_score = 0
negative_score = 0

for word in words:

if word in positive_words:
positive_score += 1
elif word in negative_words:
negative_score += 1

if positive_score > negative_score:

print("The text is positive")
else:
print("The text is negative")

9
Improving The Program even
Further

 You’ll notice that there’s a problem with the code in the

previous slide, and that some words might be
positive that we haven’t put in our list.

10
Using Word Similarity

 What we’ll do is that for each word in our text, we’ll

check how similar it is to each word in the positive
and negative lists.
 To do so, we’ll also need to remove all irrelevant words.

11
Stop Words

A stop word is a commonly used word (such as “the”,

“a”, “an”, “in”) that a search engine has been
programmed to ignore, both when indexing entries for
searching and when retrieving them as the result of a
search query.
Stop words are commonly used in Natural Language
Processing (NLP) to eliminate words that are so
commonly used that they carry very little useful
information. 12
Example for Stop Word

13
Removing Stop Words

from nltk.corpus import stopwords

stop_words= stopwords.words("english")
filtered_words = []

for word in words:

if word not in stop_words:
filtered_words.append(word)

14
Using Word Similarity

 What we’ll do is that for each word in our text, we’ll check how similar
it is to each word in the positive list and then again for the negative list.
 We’ll keep each similarity score in a list, and get the maximum
score.

positive_score = 0
negative_score = 0
positive_similarity = []
negative_similarity = []

15
Using Word Similarity (Cont.)

from nltk.corpus import wordnet

for word in words:

for positive_word in positive_words:
word1 = wordnet.synsets(word)[0]
word2 = wordnet.synsets(positive_word)[0]
positive_similarity.append(word1.wup_similarity(word2))
for negative_word in negative_words:
word1 = wordnet.synsets(word)[0]
word2 = wordnet.synsets(negative_word)[0]
negative_similarity.append(word1.wup_similarity(word2))

positive_score += max(positive_similarity)
negative_score += max(negative_similarity)
16
Fixing Problems in Our Code

 You’ll notice that there’s an error that complains about a

“None” value in our list.
 That’s coming from the “max()” function, because it’s
expecting numbers only.
 To fix this, we simply need to use the “filter()” function to
remove all “None” values.

17
Fixing Problems in Our Code (Cont.)

for word in words:

for positive_word in positive_words:
word1 = wordnet.synsets(word)[0]
word2 = wordnet.synsets(positive_word)[0]
positive_similarity.append(word1.wup_similarity(word2))
positive_similarity = list(filter(None, positive_similarity))
for negative_word in negative_words:
word1 = wordnet.synsets(word)[0]
word2 = wordnet.synsets(negative_word)[0]
negative_similarity.append(word1.wup_similarity(word2))
negative_similarity = list(filter(None, negative_similarity))

positive_score += max(positive_similarity)
negative_score += max(negative_similarity)
18
Fixing Problems in Our Code (Cont.)

 You’ll notice that there’s another error that complains

about an invalid index.
 That’s because some of the words we’re comparing
don’t have an entry in the WordNet.
 We can simply check first if there are entries or not.

19
Fixing Problems in Our Code (Cont.)

for word in words:

for positive_word in positive_words:
if(wordnet.synsets(word) and wordnet.synsets(positive_word)):
word1 = wordnet.synsets(word)[0]
word2 = wordnet.synsets(positive_word)[0]
positive_similarity.append(word1.wup_similarity(word2))
positive_similarity = list(filter(None, positive_similarity))
for negative_word in negative_words:
if(wordnet.synsets(word) and wordnet.synsets(negative_word)):
word1 = wordnet.synsets(word)[0]
word2 = wordnet.synsets(negative_word)[0]
negative_similarity.append(word1.wup_similarity(word2))
negative_similarity = list(filter(None, negative_similarity))

positive_score += max(positive_similarity) 20
negative_score += max(negative_similarity)
Checking Our Results

if positive_score > negative_score:

print("The text is positive")
else:
print("The text is negative")

21
Wu-Palmer Similarity

The wup_similarity method is short for Wu-Palmer

Similarity, which is a scoring method based on how similar
the word senses are and where the Synsets occur relative to
each other in the hypernym tree.

22
Code #1: Introducing Synsets

from nltk.corpus import wordnet

syn1 = wordnet.synsets('hello')[0]
syn2 = wordnet.synsets('selling')[0]
print ("hello name : ", syn1.name())
print ("selling name : ", syn2.name())
 Output
hello name : hello.n.01
selling name : selling.n.01

23
Code #2: Wu Similarity

syn1.wup_similarity(syn2)

Output :
0.26666666666666666
 hello and selling is apparently 27% similar!

24
Try it out yourself

 Code:
https://colab.research.google.com/drive/19sLiFnHyDzi1
M99yRjeHlB7ekCSYXrmD

25
Task #1

 Use the mini project we did to loop through all text files
in a directory and print the document name and
whether it contains positive or negative text.
 Extra: See if you can improve the mini project even
further.

26
Task #2

 Write a python program to check the list of stopwords

in Arabic language.

27
Thank you for your attention!

28
References

 https://www.tidytextmining.com/sentiment.html

How To Perform An ATC Check Automatically During Transport TASK Release - SAP Blogs
No ratings yet
How To Perform An ATC Check Automatically During Transport TASK Release - SAP Blogs
8 pages
SAC Planning Feature Deep Dive
No ratings yet
SAC Planning Feature Deep Dive
64 pages
Text Analysis With NLTK Cheatsheet PDF
No ratings yet
Text Analysis With NLTK Cheatsheet PDF
3 pages
Text Analysis With NLTK Cheatsheet PDF
No ratings yet
Text Analysis With NLTK Cheatsheet PDF
3 pages
Text Analysis With NLTK Cheatsheet
No ratings yet
Text Analysis With NLTK Cheatsheet
3 pages
Natural Language Processing
No ratings yet
Natural Language Processing
22 pages
Language Engineering - Section
No ratings yet
Language Engineering - Section
20 pages
Part 4: Implementing The Solution in Python
No ratings yet
Part 4: Implementing The Solution in Python
5 pages
Lab1 IR
No ratings yet
Lab1 IR
14 pages
NLP LAB_MANUAL (1)
No ratings yet
NLP LAB_MANUAL (1)
33 pages
ASTW RA03 PracticalManual
No ratings yet
ASTW RA03 PracticalManual
18 pages
Chapter 7.1 - Introducing Natural Language Processing
No ratings yet
Chapter 7.1 - Introducing Natural Language Processing
39 pages
All Practicals
No ratings yet
All Practicals
33 pages
Chapter 10 - Text Analytics
No ratings yet
Chapter 10 - Text Analytics
13 pages
Lab3 IR BIM
No ratings yet
Lab3 IR BIM
14 pages
basenlp
No ratings yet
basenlp
5 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
Viva Questions for Opinion Mining Project By NASIR ABBAS- VUBWN
No ratings yet
Viva Questions for Opinion Mining Project By NASIR ABBAS- VUBWN
8 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
ir manual
No ratings yet
ir manual
53 pages
Session 11-12 - Text Analytics
No ratings yet
Session 11-12 - Text Analytics
38 pages
Lab2 IR
No ratings yet
Lab2 IR
16 pages
CS-875-Lecture 4
No ratings yet
CS-875-Lecture 4
47 pages
Module 8 - Text - Update
No ratings yet
Module 8 - Text - Update
42 pages
Apply Word Vectors For Sentiment Analysis of APP Reviews
No ratings yet
Apply Word Vectors For Sentiment Analysis of APP Reviews
5 pages
A Tutorial On: Linguistic Data Analysis
No ratings yet
A Tutorial On: Linguistic Data Analysis
99 pages
Pipeline
No ratings yet
Pipeline
9 pages
Text Summarization Using Natural Language Processing
No ratings yet
Text Summarization Using Natural Language Processing
8 pages
Batch 2
No ratings yet
Batch 2
13 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
Text Mining
No ratings yet
Text Mining
34 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
4 pages
Assignment 2 IR
No ratings yet
Assignment 2 IR
6 pages
NLP - Practical List
No ratings yet
NLP - Practical List
14 pages
Lab 2
No ratings yet
Lab 2
49 pages
NLP_Module 2
No ratings yet
NLP_Module 2
54 pages
Sentiment Analysis Using Supervised Machine Learning Ijariie13051
No ratings yet
Sentiment Analysis Using Supervised Machine Learning Ijariie13051
7 pages
4. Chapter 8 Text Analytics
No ratings yet
4. Chapter 8 Text Analytics
42 pages
NLP_TP1_Report_Lahouel_Ibrahim
No ratings yet
NLP_TP1_Report_Lahouel_Ibrahim
6 pages
NLP Op
No ratings yet
NLP Op
16 pages
Lecture 3
No ratings yet
Lecture 3
70 pages
1
No ratings yet
1
13 pages
CT_Algorithm_Project
No ratings yet
CT_Algorithm_Project
3 pages
NLP LAB MANUAL
No ratings yet
NLP LAB MANUAL
17 pages
03 The-Different-Methods-Deal-Text-Data-Predictive-Python
No ratings yet
03 The-Different-Methods-Deal-Text-Data-Predictive-Python
16 pages
AI Lab Manual aktu
No ratings yet
AI Lab Manual aktu
11 pages
NLP_record[1][1] (1)
No ratings yet
NLP_record[1][1] (1)
23 pages
Lab
No ratings yet
Lab
8 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
Ment Analysis Text Classification
No ratings yet
Ment Analysis Text Classification
9 pages
Module III
No ratings yet
Module III
42 pages
2-Text Operations_new
No ratings yet
2-Text Operations_new
39 pages
A Graph Based Approach To Sentiment Lexicon Expansion
No ratings yet
A Graph Based Approach To Sentiment Lexicon Expansion
12 pages
NLP CT1
No ratings yet
NLP CT1
6 pages
Statistical Learning and Text Classification With NLTK and Scikit-Learn
No ratings yet
Statistical Learning and Text Classification With NLTK and Scikit-Learn
24 pages
NLP Previous Sem-1-3
No ratings yet
NLP Previous Sem-1-3
3 pages
SL-3_Assignment No 7
No ratings yet
SL-3_Assignment No 7
14 pages
Lab 5
No ratings yet
Lab 5
27 pages
Feature extraction techniques in NLP
No ratings yet
Feature extraction techniques in NLP
10 pages
Natural Language Processing_NOTES
No ratings yet
Natural Language Processing_NOTES
4 pages
Project Report
No ratings yet
Project Report
12 pages
CODING INTERVIEW: 50+ Tips and Tricks to Better Performance in Your Coding Interview
From Everand
CODING INTERVIEW: 50+ Tips and Tricks to Better Performance in Your Coding Interview
Eric Schmidt
No ratings yet
SWE-Week 01
No ratings yet
SWE-Week 01
25 pages
SWE-Week 05
No ratings yet
SWE-Week 05
32 pages
SWE-Week 03
No ratings yet
SWE-Week 03
21 pages
SWE-Week 04
No ratings yet
SWE-Week 04
17 pages
SWE-Week 02
No ratings yet
SWE-Week 02
24 pages
Natural language processing-Section (6)
No ratings yet
Natural language processing-Section (6)
29 pages
Natural language processing-Section (7)
No ratings yet
Natural language processing-Section (7)
22 pages
Natural language processing-Section (3)
No ratings yet
Natural language processing-Section (3)
25 pages
Natural language processing-Section (5)
No ratings yet
Natural language processing-Section (5)
38 pages
Natural_Language_Processing_Project_Spring2024-2025
No ratings yet
Natural_Language_Processing_Project_Spring2024-2025
2 pages
Natural language processing-Section (1)
No ratings yet
Natural language processing-Section (1)
57 pages
4-Finite State Machines_part1
No ratings yet
4-Finite State Machines_part1
31 pages
1-Introduction to NLP_part1
No ratings yet
1-Introduction to NLP_part1
31 pages
Lazy Learners PDF
No ratings yet
Lazy Learners PDF
15 pages
Computer Based Information System
100% (1)
Computer Based Information System
3 pages
Noise Hazards Associated With The Call Centre Industry
No ratings yet
Noise Hazards Associated With The Call Centre Industry
9 pages
Omnitek v. Apple
No ratings yet
Omnitek v. Apple
52 pages
Portfolio New Format
No ratings yet
Portfolio New Format
30 pages
109 Solidworks Assembly Tutorial
No ratings yet
109 Solidworks Assembly Tutorial
27 pages
02 - WS Inst 13735
No ratings yet
02 - WS Inst 13735
4 pages
Permas Contact
100% (1)
Permas Contact
14 pages
All Selenium Java Commands
No ratings yet
All Selenium Java Commands
35 pages
Multiple Table Test
No ratings yet
Multiple Table Test
3 pages
Heidelberg Global Training For Print Professionals
100% (2)
Heidelberg Global Training For Print Professionals
6 pages
Enya 62datasheet
No ratings yet
Enya 62datasheet
3 pages
Activity Monitoring of Islamic Prayer Salat Postur
No ratings yet
Activity Monitoring of Islamic Prayer Salat Postur
6 pages
Onkyo TX-nr838 SM
No ratings yet
Onkyo TX-nr838 SM
116 pages
How To Solve Performance Issue in MRP
No ratings yet
How To Solve Performance Issue in MRP
5 pages
Ss
No ratings yet
Ss
26 pages
2012 Qlogic Storage Area Networking Interoperability Guide
No ratings yet
2012 Qlogic Storage Area Networking Interoperability Guide
748 pages
Low Voltage - Reading Material
No ratings yet
Low Voltage - Reading Material
47 pages
Week 1
No ratings yet
Week 1
2 pages
GSM Rbs 2216/2116 Installation Verification: Welcome To Ericsson Education
No ratings yet
GSM Rbs 2216/2116 Installation Verification: Welcome To Ericsson Education
34 pages
Đê 132 Năm 2017
No ratings yet
Đê 132 Năm 2017
7 pages
PhpIPAM - The Real Working Installation Guide - Nerd on the Hill
No ratings yet
PhpIPAM - The Real Working Installation Guide - Nerd on the Hill
4 pages
Preventive Maintenance Belts
No ratings yet
Preventive Maintenance Belts
6 pages
190-01717-10 - D (G600 TXi PG)
No ratings yet
190-01717-10 - D (G600 TXi PG)
342 pages
HP Damodar Valley Offer Feb-Mar'22
No ratings yet
HP Damodar Valley Offer Feb-Mar'22
6 pages
Direct RF Sampling Transceiver Architecture Applied To VHF Radio, Acars and Elts
No ratings yet
Direct RF Sampling Transceiver Architecture Applied To VHF Radio, Acars and Elts
10 pages
Final Security
No ratings yet
Final Security
6 pages
31 Indication System PDF
100% (1)
31 Indication System PDF
512 pages

Natural language processing-Section (4)

Uploaded by

Natural language processing-Section (4)

Uploaded by

Language Engineering

Prepared by: Abdelrahman M. Safwat

Section (4) – Mini Project

 We want to create a Python program that takes a file

 Next, we need to open the text file and store its

 Next, we need to tokenize all the text into words.

 Next, we need to loop through all words and check if

 You’ll notice that there’s a problem with the code in the

 To solve the problem we mentioned, we can keep score

for word in words:

if positive_score > negative_score:

 You’ll notice that there’s a problem with the code in the

 What we’ll do is that for each word in our text, we’ll

A stop word is a commonly used word (such as “the”,

from nltk.corpus import stopwords

for word in words:

from nltk.corpus import wordnet

for word in words:

 You’ll notice that there’s an error that complains about a

for word in words:

 You’ll notice that there’s another error that complains

for word in words:

if positive_score > negative_score:

The wup_similarity method is short for Wu-Palmer

from nltk.corpus import wordnet

 Write a python program to check the list of stopwords

You might also like