0% found this document useful (0 votes)

24 views5 pages

NLP Programs

The document discusses natural language processing using the Natural Language Toolkit (NLTK) module with Python. It covers installing NLTK with pip, downloading NLTK components, tokenizing text into words and sentences, removing common stop words, and using the Porter stemming algorithm to reduce words to their root form.

Uploaded by

cnu.vadali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views5 pages

NLP Programs

Uploaded by

cnu.vadali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Natural Language Processing using the Natural Language Toolkit, or NLTK,

module with Python.

If the python was not installed, go to Python.org and download the latest version of Python if you are on Windows.
If you are on Mac or Linux, you should be able to run an apt-get install python3

The NLTK module is a massive tool kit, aimed at helping you with the entire Natural Language Processing (NLP)
methodology.

NLTK will aid you with everything from splitting sentences from paragraphs, splitting up words, recognizing the
part of speech of those words, highlighting the main subjects, and then even with helping your machine to
understand what the text is all about.

NLTK 3. The easiest method to installing the NLTK module is going to be with pip.

For all users, that is done by opening up cmd.exe, bash, or whatever shell you use and typing:
pip install nltk

Next, we need to install some of the components for NLTK. Open python via whatever means you normally do, and
type:

import nltk
nltk.download()

Some quick vocabulary:

 Corpus - Body of text, singular. Corpora is the plural of this. Example: A collection of medical journals.
 Lexicon - Words and their meanings. Example: English dictionary. Consider, however, that various fields
will have different lexicons. For example: To a financial investor, the first meaning for the word "Bull" is
someone who is confident about the market, as compared to the common English lexicon, where the first
meaning for the word "Bull" is an animal. As such, there is a special lexicon for financial investors,
doctors, children, mechanics, and so on.
 Token - Each "entity" that is a part of whatever was split up based on rules. For examples, each word is a
token when a sentence is "tokenized" into words. Each sentence can also be a token, if you tokenized the
sentences out of a paragraph.

1) Tokenizing Words and Sentences with NLTK

 Tokenizing - Splitting sentences and words from the body of text.

from nltk.tokenize import sent_tokenize, word_tokenize

EXAMPLE_TEXT = "Hello Mr. Smith, how are you doing today? The weather is great, and Python is awesome.
The sky is pinkish-blue. You shouldn't eat cardboard."

print(sent_tokenize(EXAMPLE_TEXT))

The first step would be likely doing a simple .split('. '), or splitting by period followed by a space. Then maybe you
would bring in some regular expressions to split by period, space, and then a capital letter.

NLTK is going to go ahead and just save you a ton of time with this seemingly simple, yet very complex, operation

The above code will output the sentences, split up into a list of sentences, which you can do things like iterate
through with a for loop.

['Hello Mr. Smith, how are you doing today?', 'The weather is great, and Python is awesome.', 'The sky is pinkish-
blue.', "You shouldn't eat cardboard."]

print(word_tokenize(EXAMPLE_TEXT))

There are a few things to note here. First, notice that punctuation is treated as a separate token. Also, notice the
separation of the word "shouldn't" into "should" and "n't." Finally, notice that "pinkish-blue" is indeed treated like
the "one word" it was meant to be turned into. Pretty cool!

2) Stop words with NLTK

Stop Words: A stop word is a commonly used word (such as “the”, “a”, “an”, “in”) that a search engine has been
programmed to ignore, both when indexing entries for searching and when retrieving them as the result of a
search query.
Stop words as words that just contain no meaning, and we want to remove them. When we want to completely
cease analysis if you detect words that are commonly used sarcastically, and stop immediately

Do this easily, by storing a list of words that you consider to be stop words. NLTK starts you off with a bunch of
words that they consider to be stop words, you can access it via the NLTK corpus with:

from nltk.corpus import stopwords

To check the list of stopwords you can type the following commands in the python shell.

import nltk
from nltk.corpus import stopwords
print(stopwords.words('english'))

Here is how you might incorporate using the stop_words set to remove the stop words from your text:

from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize

example_sent = "This is a sample sentence, showing off the stop words filtration."

stop_words = set(stopwords.words('english'))

word_tokens = word_tokenize(example_sent)

filtered_sentence = [w for w in word_tokens if not w in stop_words]

filtered_sentence = []

for w in word_tokens:
if w not in stop_words:
filtered_sentence.append(w)

print(word_tokens)
print(filtered_sentence)

Output:

['This', 'is', 'a', 'sample', 'sentence', ',', 'showing', 'off', 'the', 'stop', 'words', 'filtration', '.']

['This', 'sample', 'sentence', ',', 'showing', 'stop', 'words', 'filtration', '.']

3) Stemming words with NLTK

The idea of stemming is a sort of normalizing method. Many variations of words carry the same meaning, other than
when tense is involved.

The reason why we stem is to shorten the lookup, and normalize sentences.

Consider:
I was taking a ride in the car.
I was riding in the car.

One of the most popular stemming algorithms is the Porter stemmer, which has been around since 1979.
First, we're going to grab and define our stemmer:

from nltk.stem import PorterStemmer

from nltk.tokenize import sent_tokenize, word_tokenize

ps = PorterStemmer()

Now, let's choose some words with a similar stem, like:

example_words = ["python","pythoner","pythoning","pythoned","pythonly"]

Next, we can easily stem by doing something like:

for w in example_words:
print(ps.stem(w))

Output

python
python
python
python
pythonli

Now let's try stemming a typical sentence, rather than some words:

new_text = "It is important to by very pythonly while you are pythoning with python. All pythoners have pythoned
poorly at least once."

Output:

It
is
import
to
by
veri
pythonli
while
you
are
python
with
python
.
All
python
have
python
poorli
at
least
onc

Power IC L9132
100% (7)
Power IC L9132
1 page
Marvel WD
100% (1)
Marvel WD
61 pages
Cameron Hydraulic Data Book PDF
0% (6)
Cameron Hydraulic Data Book PDF
3 pages
NLP Programming
No ratings yet
NLP Programming
39 pages
Lab Prgms Weel1-Output
No ratings yet
Lab Prgms Weel1-Output
4 pages
Lab 2
No ratings yet
Lab 2
49 pages
Wsma Final Manual
No ratings yet
Wsma Final Manual
58 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
55 pages
NLP Manual (1-12) 1
No ratings yet
NLP Manual (1-12) 1
56 pages
Natural Language Processing With Python's NLTK Package – Real Python
No ratings yet
Natural Language Processing With Python's NLTK Package – Real Python
27 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
54 pages
Natural Language Processing: Practical 1
No ratings yet
Natural Language Processing: Practical 1
64 pages
Dsbdal A7
No ratings yet
Dsbdal A7
65 pages
NLP-Lab Manual - Ashwini - Kachare
No ratings yet
NLP-Lab Manual - Ashwini - Kachare
41 pages
UBC Summer School in NLP - VSP 2019 Lecture 10
No ratings yet
UBC Summer School in NLP - VSP 2019 Lecture 10
33 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
33 pages
NLTK Tutorial
No ratings yet
NLTK Tutorial
33 pages
NLTK
No ratings yet
NLTK
3 pages
UNIT-V-NLP Using NLTK
No ratings yet
UNIT-V-NLP Using NLTK
19 pages
Chapter 3
No ratings yet
Chapter 3
4 pages
CH4
No ratings yet
CH4
15 pages
NLTK
No ratings yet
NLTK
16 pages
NLP Record
No ratings yet
NLP Record
6 pages
p4
No ratings yet
p4
10 pages
NLP Lab Manual (1)
No ratings yet
NLP Lab Manual (1)
19 pages
NLP Smitpatel
No ratings yet
NLP Smitpatel
32 pages
NLP (1)
No ratings yet
NLP (1)
12 pages
PPT for Assignment-10 (Machine Learning With Python_NLP-2)
No ratings yet
PPT for Assignment-10 (Machine Learning With Python_NLP-2)
37 pages
Text Preprocessing Stages
No ratings yet
Text Preprocessing Stages
8 pages
Removing Stopwords in NLP
No ratings yet
Removing Stopwords in NLP
32 pages
Jal Patel NLP
No ratings yet
Jal Patel NLP
32 pages
NLP LAB MANUAL
No ratings yet
NLP LAB MANUAL
17 pages
Python NLP Assignment
No ratings yet
Python NLP Assignment
9 pages
NLP_Record(Weeks 1-12) (1)
No ratings yet
NLP_Record(Weeks 1-12) (1)
41 pages
NLP LAB MANUAL 3-2 AIML R22 UPDATE (1)
100% (1)
NLP LAB MANUAL 3-2 AIML R22 UPDATE (1)
20 pages
Text Processing
No ratings yet
Text Processing
16 pages
277e5fcb-2a64-4802-9bfa-c0b031207675
No ratings yet
277e5fcb-2a64-4802-9bfa-c0b031207675
20 pages
NLP LAB_MANUAL (1)
No ratings yet
NLP LAB_MANUAL (1)
33 pages
NLTK
No ratings yet
NLTK
4 pages
AIML_P4
No ratings yet
AIML_P4
12 pages
AI Zone: Log in Sign Up
No ratings yet
AI Zone: Log in Sign Up
24 pages
NLP Experiment 2
No ratings yet
NLP Experiment 2
5 pages
Tokenizer
No ratings yet
Tokenizer
4 pages
Lecture 8 - Text Analytics NLP
No ratings yet
Lecture 8 - Text Analytics NLP
24 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
NLP PRATICAL
No ratings yet
NLP PRATICAL
14 pages
Shubham Jade MSC It 31031420010 NLP Practical Journal
No ratings yet
Shubham Jade MSC It 31031420010 NLP Practical Journal
17 pages
tinywow_pythass3_77951173
No ratings yet
tinywow_pythass3_77951173
17 pages
NLTK: The Natural Language Toolkit: Steven Bird Edward Loper
No ratings yet
NLTK: The Natural Language Toolkit: Steven Bird Edward Loper
4 pages
NLP Intro
No ratings yet
NLP Intro
15 pages
Intro To NLP: Natural Language Toolkit
No ratings yet
Intro To NLP: Natural Language Toolkit
11 pages
Sree017 NLP
No ratings yet
Sree017 NLP
3 pages
Ai & ML Week-11
No ratings yet
Ai & ML Week-11
32 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
NLP 02
No ratings yet
NLP 02
6 pages
Nlp Lab Manual
No ratings yet
Nlp Lab Manual
32 pages
NLTK Documentation: Release 3.2.5
No ratings yet
NLTK Documentation: Release 3.2.5
87 pages
AP for NLP-Word 2 Vec
No ratings yet
AP for NLP-Word 2 Vec
33 pages
Installing NLTK in Windows
No ratings yet
Installing NLTK in Windows
13 pages
NLP UNIT-2
No ratings yet
NLP UNIT-2
12 pages
ir manual
No ratings yet
ir manual
53 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
Python programming for beginners: Python programming for beginners by Tanjimul Islam Tareq
From Everand
Python programming for beginners: Python programming for beginners by Tanjimul Islam Tareq
Tanjimul Islam Tareq
No ratings yet
Social Media Presentation at Cardiff City Stadium
No ratings yet
Social Media Presentation at Cardiff City Stadium
33 pages
A320 22
No ratings yet
A320 22
11 pages
Software Project Management Unit-4 - 6 PDF
No ratings yet
Software Project Management Unit-4 - 6 PDF
2 pages
Catia Pre
No ratings yet
Catia Pre
2 pages
Clients List
No ratings yet
Clients List
14 pages
devops
No ratings yet
devops
55 pages
DSU3
No ratings yet
DSU3
6 pages
MATH 03 Lesson 3 Random Variables and Probability Distributions PDF
No ratings yet
MATH 03 Lesson 3 Random Variables and Probability Distributions PDF
18 pages
Statistics On Technological Development
No ratings yet
Statistics On Technological Development
5 pages
FM136 Mighty Mule Wireless Intercom PDF
No ratings yet
FM136 Mighty Mule Wireless Intercom PDF
8 pages
Kips Class 7 Cyber Apps Chapter 5 Working With Layers
No ratings yet
Kips Class 7 Cyber Apps Chapter 5 Working With Layers
2 pages
Command List
No ratings yet
Command List
123 pages
How To Solve Performance Issue in MRP
No ratings yet
How To Solve Performance Issue in MRP
5 pages
ISDP
No ratings yet
ISDP
56 pages
Sonata Voice Command List
No ratings yet
Sonata Voice Command List
3 pages
Havmor Ice Cream LTD: Comparative Quotation Analysis Report
No ratings yet
Havmor Ice Cream LTD: Comparative Quotation Analysis Report
12 pages
Servicenow-Certification-Faq 20200128
No ratings yet
Servicenow-Certification-Faq 20200128
17 pages
Laboratory Exercise 1: Switches, Lights, and Multiplexers
No ratings yet
Laboratory Exercise 1: Switches, Lights, and Multiplexers
8 pages
Placement Report (2018-2022)
No ratings yet
Placement Report (2018-2022)
6 pages
Writing A Flowchart To Control A System
No ratings yet
Writing A Flowchart To Control A System
5 pages
Data Manipulation & Analysis
No ratings yet
Data Manipulation & Analysis
31 pages
Statement Aug 19 XXXXXXXX3962
No ratings yet
Statement Aug 19 XXXXXXXX3962
4 pages
FAA 2020 0456 0006 - Attachment - 4
No ratings yet
FAA 2020 0456 0006 - Attachment - 4
41 pages
Design Development DS Robots
No ratings yet
Design Development DS Robots
11 pages
Internship Offer Letter Akshay Kumar Sunkari- 0125-120
No ratings yet
Internship Offer Letter Akshay Kumar Sunkari- 0125-120
2 pages
Downloaded Licenses
No ratings yet
Downloaded Licenses
6 pages
Syllabus Lecturer in Information Technology Govt
No ratings yet
Syllabus Lecturer in Information Technology Govt
3 pages

NLP Programs

Uploaded by

NLP Programs

Uploaded by

Natural Language Processing using the Natural Language Toolkit, or NLTK,

module with Python.

Some quick vocabulary:

1) Tokenizing Words and Sentences with NLTK

from nltk.tokenize import sent_tokenize, word_tokenize

2) Stop words with NLTK

from nltk.corpus import stopwords

from nltk.corpus import stopwords

filtered_sentence = [w for w in word_tokens if not w in stop_words]

['This', 'sample', 'sentence', ',', 'showing', 'stop', 'words', 'filtration', '.']

3) Stemming words with NLTK

from nltk.stem import PorterStemmer

Now, let's choose some words with a similar stem, like:

Next, we can easily stem by doing something like:

You might also like