0% found this document useful (0 votes)
4 views6 pages

Gentle Start to Natural Language Processing Using Python

The document provides an introduction to Natural Language Processing (NLP) using Python, explaining its significance and practical applications such as search engines and spam filters. It highlights the use of the NLTK library for NLP tasks, including installation instructions and a simple example of analyzing webpage content. The tutorial aims to guide beginners in understanding and implementing basic NLP techniques.

Uploaded by

Trong Nguyen Duc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views6 pages

Gentle Start to Natural Language Processing Using Python

The document provides an introduction to Natural Language Processing (NLP) using Python, explaining its significance and practical applications such as search engines and spam filters. It highlights the use of the NLTK library for NLP tasks, including installation instructions and a simple example of analyzing webpage content. The tutorial aims to guide beginners in understanding and implementing basic NLP techniques.

Uploaded by

Trong Nguyen Duc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

8/23/2019 Gentle Start to Natural Language Processing using Python

Raheel Shaikh
Oct 20, 2018 · 5 min read

Gentle Start to Natural Language Processing using Python

What is NLP ?

Natural language processing (NLP) is about developing applications and services that
are able to understand human languages. Some Practical examples of NLP are speech
recognition for eg: google voice search, understanding what the content is about or
sentiment analysis etc.

Benefits of NLP

As all of you know, there are millions of gigabytes every day are generated by blogs,
social websites, and web pages.

There are many companies gathering all of these data for understanding users and
their passions and give these reports to the companies to adjust their plans.

Suppose a person loves traveling and is regularly searching for a holiday destination,
the searches made by the user is used to provide him with relative advertisements by
online hotel and flight booking apps.

https://towardsdatascience.com/gentle-start-to-natural-language-processing-using-python-6e46c07addf3 1/6
8/23/2019 Gentle Start to Natural Language Processing using Python

You know what, search engines are not the only implementation of natural language
processing (NLP) and there are a lot of awesome implementations out there.

NLP Implementations
These are some of the successful implementations of Natural Language Processing
(NLP):

Search engines like Google, Yahoo, etc. Google search engine understands that
you are a tech guy so it shows you results related to you.

Social websites feed like the Facebook news feed. The news feed algorithm
understands your interests using natural language processing and shows you
related Ads and posts more likely than other posts.

Speech engines like Apple Siri.

Spam filters like Google spam filters. It’s not just about the usual spam filtering,
now spam filters understand what’s inside the email content and see if it’s a spam
or not.

How do I Start with NLP using Python?

Natural language toolkit (NLTK) is the most popular library for natural language
processing (NLP) which was written in Python and has a big community behind it.

NLTK also is very easy to learn, actually, it’s the easiest natural language processing
(NLP) library that you’ll use.

In this NLP Tutorial, we will use Python NLTK library.

Before I start installing NLTK, I assume that you know some Python basics to get
started.

Install nltk
If you are using Windows or Linux or Mac, you can install NLTK using pip:

$ pip install nltk


You can use NLTK on Python 2.7, 3.4, and 3.5 at the time of writing this post.

Alternatively, you can install it from source from this tar.


https://towardsdatascience.com/gentle-start-to-natural-language-processing-using-python-6e46c07addf3 2/6
8/23/2019 Gentle Start to Natural Language Processing using Python

To check if NLTK has installed correctly, you can open python terminal and type the
following:

Import nltk
If everything goes fine, that means you’ve successfully installed NLTK library.

Once you’ve installed NLTK, you should install the NLTK packages by running the
following code:

import nltk
nltk.download()

This will show the NLTK downloader to choose what packages need to be installed.

You can install all packages since they have small sizes, so no problem. Now let’s start
the show.

https://towardsdatascience.com/gentle-start-to-natural-language-processing-using-python-6e46c07addf3 3/6
8/23/2019 Gentle Start to Natural Language Processing using Python

Here we will learn how to identify what the web page is about using
NLTK in Python
First, we will grab a webpage and analyze the text to see what the page is about.

urllib module will help us to crawl the webpage

import urllib.request
response =
urllib.request.urlopen('https://en.wikipedia.org/wiki/SpaceX')
html = response.read()
print(html)

It’s pretty clear from the link that page is about SpaceX now let us see whether our code
is able to correctly identify the page’s context.

We will use Beautiful Soup which is a Python library for pulling data out of HTML and
XML files. We will use beautiful soup to clean our webpage text of HTML tags.

from bs4 import BeautifulSoup


soup = BeautifulSoup(html,'html5lib')
text = soup.get_text(strip = True)
print(text)

You will get an output somewhat like this

Now we have clean text from the crawled web page, let’s convert the text into tokens.

tokens = [t for t in text.split()]


print(tokens)

https://towardsdatascience.com/gentle-start-to-natural-language-processing-using-python-6e46c07addf3 4/6
8/23/2019 Gentle Start to Natural Language Processing using Python

your output text is now converted into tokens

Count word Frequency


nltk offers a function FreqDist() which will do the job for us. Also, we will remove stop
words (a, at, the, for etc) from our web page as we don't need them to hamper our
word frequency count. We will plot the graph for most frequently occurring words in
the webpage in order to get the clear picture of the context of the web page

from nltk.corpus import stopwords


sr= stopwords.words('english')
clean_tokens = tokens[:]
for token in tokens:
if token in stopwords.words('english'):

clean_tokens.remove(token)

freq = nltk.FreqDist(clean_tokens)
for key,val in freq.items():
print(str(key) + ':' + str(val))

freq.plot(20, cumulative=False)

frequency word count output


https://towardsdatascience.com/gentle-start-to-natural-language-processing-using-python-6e46c07addf3 5/6
8/23/2019 Gentle Start to Natural Language Processing using Python

graph of 20 most frequent words.

Great!!! the code has correctly identified that the web page speaks about SpaceX.

It was so simple and interesting right !!! you can similarly identify the news articles,
blogs etc.

I have done my best to make the article simple and interesting for you, hope you found
it useful and interesting too.

You have successfully taken your first step towards NLP, there is an ocean to explore for
you…

If you liked this post give it a Clap, it inspires me to write and share more with you guys
:)

Thank you…

Machine Learning Arti cial Intelligence NLP Python Data Science

About Help Legal

https://towardsdatascience.com/gentle-start-to-natural-language-processing-using-python-6e46c07addf3 6/6

You might also like