Processing text using NLP | Basics Last Updated : 22 Sep, 2022 Comments Improve Suggest changes Like Article Like Report In this article, we will be learning the steps followed to process the text data before using it to train the actual Machine Learning Model. Importing Libraries The following must be installed in the current working environment: NLTK Library: The NLTK library is a collection of libraries and programs written for processing of English language written in Python programming language.urllib library: This is a URL handling library for python.BeautifulSoup library: This is a library used for extracting data out of HTML and XML documents. Python3 import nltk from bs4 import BeautifulSoup from urllib.request import urlopen Once importing all the libraries, we need to extract the text. Text can be in string datatype or a file that we have to process. Extracting Data For this article, we are using web scraping to read a webpage then we will be using get_text() function for changing it to str format. Python3 raw = urlopen("https://www.w3.org/TR/PNG/iso_8859-1.txt").read() raw1 = BeautifulSoup(raw) raw2 = raw1.get_text() raw2 Output : Data Preprocessing Once the data extraction is done, the data is now ready to process. For that follow these steps : 1. Deletion of Punctuations and numerical text Python3 # deletion of punctuations and numerical values def punc(raw2): raw2 = re.sub('[^a-zA-Z]', ' ', raw2) return raw2 2. Creating Tokens Python3 # extracting tokens def token(raw2): tokens = nltk.word_tokenize(raw2) return tokens 3. Removing Stopwords Python3 # lowercase the letters # removing stopwords def remove_(tokens): final = [word.lower() for word in tokens if word not in stopwords.words("english")] return final 4. Lemmatization Python3 # Lemmatizing from textblob import TextBlob def lemma(final): # initialize an empty string str1 = ' '.join(final) s = TextBlob(str1) lemmatized_sentence = " ".join([w.lemmatize() for w in s.words]) return final 5. Joining the final tokens Python3 # Joining the final results def join_(final): review = ' '.join(final) return ans To execute the above functions refer this code : Python3 # Calling all the functions raw2 = punc(raw2) tokens = token(raw2) final = remove_(tokens) final = lemma(final) ans = join_(final) ans Output : Comment More infoAdvertise with us Next Article Processing text using NLP | Basics N noob_coders_ka_baap Follow Improve Article Tags : Machine Learning NLP AI-ML-DS python Practice Tags : Machine Learningpython Similar Reads Text Preprocessing in NLP Natural Language Processing (NLP) has seen tremendous growth and development, becoming an integral part of various applications, from chatbots to sentiment analysis. One of the foundational steps in NLP is text preprocessing, which involves cleaning and preparing raw text data for further analysis o 6 min read Rule-based Stemming in Natural Language Processing Rule-based stemming is a technique in natural language processing (NLP) that reduces words to their root forms by applying specific rules for removing suffixes and prefixes. This method relies on a predefined set of rules that dictate how words should be altered, making it a straightforward approach 2 min read Unleashing the Power of Natural Language Processing Imagine talking to a computer and it understands you just like a human would. Thatâs the magic of Natural Language Processing. It a branch of AI that helps computers understand and respond to human language. It works by combining computer science to process text, linguistics to understand grammar an 6 min read Tokenize text using NLTK in python To run the below python program, (NLTK) natural language toolkit has to be installed in your system.The NLTK module is a massive tool kit, aimed at helping you with the entire Natural Language Processing (NLP) methodology.In order to install NLTK run the following commands in your terminal. sudo pip 3 min read TensorFlow for NLU and Text Processing Natural Language Understanding (NLU) focuses on the interaction between computers and humans through natural language. The main goal of NLU is to enable computers to understand, interpret, and generate human languages in a valuable way. It is crucial for processing and analyzing large amounts of uns 7 min read Natural Language Processing (NLP) Tutorial Natural Language Processing (NLP) is the branch of Artificial Intelligence (AI) that gives the ability to machine understand and process human languages. Human languages can be in the form of text or audio format.Applications of NLPThe applications of Natural Language Processing are as follows:Voice 5 min read Sentiment Analysis using Fuzzy Logic Sentiment analysis, also known as opinion mining, is a crucial area of natural language processing (NLP) that involves determining the sentiment expressed in a piece of text. This sentiment can be positive, negative, or neutral. Traditional sentiment analysis methods often rely on machine learning t 7 min read Restaurant Review Analysis Using NLP and SQLite Normally, a lot of businesses are remained as failures due to lack of profit, lack of proper improvement measures. Mostly, restaurant owners face a lot of difficulties to improve their productivity. This project really helps those who want to increase their productivity, which in turn increases thei 9 min read Stemming with R Text Analysis Text analysis is a crucial component of data science and natural language processing (NLP). One of the fundamental techniques in this field is stemming is a process that reduces words to their root or base form. Stemming is vital in simplifying text data, making it more amenable to analysis and patt 4 min read Natural Language Processing with R Natural Language Processing (NLP) is a field of artificial intelligence (AI) that enables machines to understand and process human language. R, known for its statistical capabilities, provides a wide range of libraries to perform various NLP tasks. Understanding Natural Language ProcessingNLP involv 4 min read Like