0% found this document useful (0 votes)

2 views24 pages

Lecture 8 - Text Analytics NLP

The document covers various aspects of Text Analytics and Natural Language Processing (NLP), including text cleaning, tokenization, stemming, and encoding text as a Bag of Words. It discusses the importance of NLP in analyzing unstructured textual data for business insights and compares text mining with NLP. Additionally, it outlines methods for text classification, feature generation using Bag of Words and TF-IDF, and sentiment analysis approaches.

Uploaded by

2025032

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views24 pages

Lecture 8 - Text Analytics NLP

Uploaded by

2025032

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

Machine Learning

HDip in DAB
CCT College Dublin

Text Analytics and NLP

Week 9

Lecturer: Dr. Muhammad Iqbal *

©CCT College Dublin 2022

Email: [email protected] 1
Agenda
• Introduction and NLP
• Cleaning Text
• Parsing and Cleaning HTML
• Removing Punctuation
• Tokenizing Text
• Removing Stop Words
• Stemming Words
• Tagging Parts of Speech
• Encoding Text as a Bag of Words
• Weighting Word Importance 2
Introduction to NLP
• Imagine a hypothetical person, John Doe. He’s the CTO of a fast-growing technology startup. On a busy
day, John wakes up and has this conversation with his digital assistant.
• John: “How is the weather today?”
• Digital assistant: “It is 37 degrees centigrade outside with no rain today.”
• John: “What does my schedule look like?”
• Digital assistant: “You have a strategy meeting at 4 p.m. and an all-hands at 5:30 p.m. Based on today’s
traffic situation, it is recommended you leave for the office by 8:15 a.m.”
• While he’s getting dressed, John probes the assistant on his fashion choices:
• John: “What should I wear today?”
• Digital assistant: “White seems like a good choice.”
• We might have used smart assistants such as Amazon Alexa, Google Home, or Apple Siri or Microsoft
Cortana to do similar things.
• We have experience of talking to these assistants not in a programming language, but in our natural 3
Introduction to NLP
• In today's area of internet and online services, data is generating at incredible speed and amount.

• Generally, Data analyst, engineer, and scientists are handling relational or tabular data. These
tabular data columns have either numerical or categorical data.

• Generated data has a variety of structures such as text, image, audio, and video. Online activities
such as articles, website text, blog posts, social media posts are generating unstructured textual
data.

• Corporate and Business need to analyze textual data to understand customer activities, opinion,
and feedback to successfully derive their business.

• To compete with big textual data, text analytics is evolving at a faster rate than ever before.

• Text Analytics has lots of applications in today's online world. By analyzing tweets on Twitter,
we can find trending news and peoples reaction on a particular event. Amazon can understand
user feedback or review on the specific product. BookMyShow can discover people's opinion
about the movie. Youtube can also analyze and understand peoples viewpoints on a video. 4
Compare Text Analytics, NLP and Text
Mining
• Text mining is referred to as text analytics. Text mining is a process of exploring
sizeable textual data and find patterns.
• Text Mining processes the text itself, while NLP processes with the underlying
metadata.
• Finding frequency counts of words, length of the sentence, presence/ absence of
specific words is known as text mining.
• Natural language processing (NLP) is one of the components of text mining. NLP
helps to identify sentiment, finding entities in the sentence, and category of blog/
article.
• Text mining is pre-processed data for text analytics. In Text Analytics, statistical
and machine learning algorithm used to classify information.
5
What is Special About Learning from Text?
• Most machine learning applications in the text domain work with the bag-of-words
representation in which the words are treated as dimensions with values corresponding
to word frequencies.
• A data set corresponds to a collection of documents, which is also referred to as a
corpus. The complete and distinct set of words used to define the corpus is referred to as
the lexicon.
• Dimensions are also referred to as terms or features. Some applications of text work with
a binary representation in which the presence of a term in a document corresponds to a
value of 1, and 0, otherwise.
• Other applications use a normalized function of the word frequencies as the values of
the dimensions. In each of these cases, the dimensionality of data is very large, and may
be of the order of 105 or even 106.
• Furthermore, most values of the dimensions are 0s, and only a few dimensions take on
positive values. In other words, text is a high-dimensional, sparse, and non-negative
6
representation.
Cleaning Text
• Problem: We have some unstructured text data and want to complete
some basic cleaning.
• Solution
• Most basic text cleaning operations should only replace Python’s core
string operations, in particular strip, replace, and split:

7
Regular Expressions
• A RegEx, or Regular Expression, is a sequence of characters that forms a search
pattern.
• RegEx can be used to check if a string contains the specified search pattern.
• Python has a built-in package called re, which can be used to work with Regular
Expressions. For example, import re

8
https://www.w3schools.com/python/python_regex.asp
Parsing and Cleaning HTML
• Problem: We have text data with HTML elements and want to extract just the text.
• Solution: Use Beautiful Soup’s extensive set of options to parse and extract from
HTML.

• Despite the strange name, Beautiful Soup is a powerful Python library designed for
scraping HTML. Beautiful Soup is used to scrape live websites, but we can just as
easily use it to extract text data embedded in HTML. The full range of Beautiful
Soup operations is significantly wider, but the few methods used in our solution 9
Removing Punctuation
• Problem: You have a feature of text data and want to remove punctuation.
• Solution: Define a function that uses translate with a dictionary of punctuation characters:

• Translate is a Python method popular due to its blazing speed. In this solution, we created a
dictionary, punctuation, with all punctuation characters according to Unicode as its keys and
None as its values.
• We translated all characters in the string that are in the punctuation into None, effectively
removing them. There are more readable ways to remove punctuation.
• It is important to be conscious of the fact that punctuation contains information (e.g., “Right?”
versus “Right!”). Removing punctuation is a necessary evil to create features; however, if the10
Tokenizing Text
• Problem: You have text and want to break it up into individual words.
• Solution: Natural Language Toolkit for Python (NLTK) has a powerful set of text
manipulation operations, including word tokenizing:
• Tokenization is a common task after
cleaning text data because it is the first
step in the process of turning the text
into data we will use to construct the
useful features.
• We use the method word_tokenize() to
split a sentence into words. The output
of word tokenization can be converted
to Data Frame for better text
understanding in machine learning
11
applications.
Removing Stop Words
• Problem: Given tokenized text data, you want
to remove extremely common words (e.g., a, is,
of, on) that contain little informational value.

• Solution: Use NLTK’s stopwords:

• While “stop words” can refer to any set of

words that we want to remove before
processing, frequently the term refers to
extremely common words that themselves
contain little information value.

• NLTK has a list of common stop words that we

can use to find and remove stop words in our
tokenized words. 12
Stemming Words
• Problem: You have tokenized words and want to convert them into their root forms.
• Solution: Use NLTK’s PorterStemmer:

• Stemming reduces a word to its stem by identifying and removing affixes (e.g., gerunds)
while keeping the root meaning of the word. For example, both “tradition” and
“traditional” have “tradit” as their stem, indicating that while they are different words
they represent the same general concept.
• By stemming text data, we transform it to something less readable, but closer to its base
meaning and thus more suitable for comparison across observations. NLTK’s
PorterStemmer implements the widely used Porter stemming algorithm to remove or
replace common suffixes to produce the word stem. 13
Tagging Parts of Speech
• Problem: You have text data and want to tag each word or character with its part of speech.
• Solution: Use NLTK’s pre-trained parts-of-speech tagger:

• The output is a list of tuples with the word and the tag of the part of speech. NLTK uses the
Penn Treebank parts for speech tags. Some examples of the Penn Treebank tags are 14
mentioned. https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
Tagging Parts of Speech
• A more realistic situation
would be that we have data
where every observation
contains a tweet.
• We want to convert those
sentences into features for
individual parts of speech
(e.g., a feature with 1 if a
proper noun is present, and
0 otherwise).

15
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html
Tagging Parts of Speech
• If our text is English and not on a specialized topic (e.g., medicine), the simplest
solution is to use NLTK’s pre-trained parts-of-speech tagger.
• However, if pos_tag is not very accurate, NLTK also gives us the ability to train our
own tagger.
• The major downside of training a tagger is that we need a large corpus of text
where the tag of each word is known.
• Constructing this tagged corpus is labor intensive and is probably going to be a
last resort.
• If we had a tagged corpus and wanted to train a tagger, the following is an
example of how we could do it.
• The corpus we are using is the Brown Corpus, one of the most popular sources of
tagged text. 16
https://en.wikipedia.org/wiki/Brown_Corpus
Encoding Text as a Bag of Words

• Problem: You have text data and want

to create a set of features indicating
the number of times an observation’s
text contains a particular word.
• Solution: Use scikit-learn’s
CountVectorizer:
• This output is a sparse array, which is
necessary when we have a large
amount of text. However, in our toy
example we can use toarray to view a
matrix of word counts for each
observation. 17
Encoding Text as a Bag of Words
• One of the most common methods of transforming text into features is by using a
bag-of-words model.
• Bag-of-words models output a feature for every unique word in the text data, with
each feature containing a count of occurrences in the observations.
• For example, in our solution the sentence I love Brazil. Brazil! has a value of 2 in the
“brazil” feature because the word brazil appears two times.
• The text data in our solution was purposely small. In the real world, a single
observation of text data could be the contents of an entire book!
• Since our bag-of-words model creates a feature for every unique word in the data,
the resulting matrix can contain thousands of features.
• This means that the size of the matrix can become very large in the memory.
However, we can exploit a common characteristic of bag-of-words feature matrices18
to reduce the amount of data we need to store.
Text Classification
• Text classification is one of the important tasks
of text mining. It is a supervised approach.
• Identifying category or class of given text such
as a blog, book, web page, news articles, and
tweets.
• It has various application in today's computer
world such as spam detection, task
categorization in CRM services, categorizing
products on E-retailer websites, classifying the
content of websites for a search engine,
sentiments of customer feedback, etc.
• We learn how you can do text classification in
python.
19
Feature Generation using Bag of Words
• In the Text Classification, we have a set of texts and their respective labels. But we directly can't
use text for our model. We need to convert these text into some numbers or vectors of numbers.

• Bag-of-words model (BoW) is the simplest way of extracting features from the text. BoW
converts text into the matrix of occurrence of words within a document. This model concerns
about whether given words occurred or not in the document.

• Example: There are three documents:

• Doc 1: I love dogs. Doc 2: I hate dogs and knitting. Doc 3: Knitting is my hobby and passion.

• Now, we can create a matrix of document and words by counting the occurrence of words in the
given document. This matrix is known as Document-Term Matrix (DTM).

20
Feature Generation using TF-IDF
• In Term Frequency (TF), we count the number of words occurred in each document. The main
issue with this Term Frequency is that it will give more weight to longer documents. Term
frequency is basically the output of the BoW model.
• IDF (Inverse Document Frequency) measures the amount of information a given word provides
across the document. IDF is the logarithmically scaled inverse ratio of the number of documents
that contain the word and the total number of documents.

• TF-IDF (Term Frequency-Inverse Document Frequency) normalizes the document term matrix. It
is the product of TF and IDF. Word with high tf-idf in a document, it is most of the times occurred
in given documents and must be absent in the other documents. So the words must be a
signature word.

21
Model Building and Evaluation (TF-IDF)
• Let us split dataset by using function train_test_split().
• We need to pass basically 3 parameters, such as
features, target, and test_set size. Additionally, we can
use random_state to select records randomly.
• First, import the MultinomialNB module and create the
Multinomial Naive Bayes classifier object using
MultinomialNB() function.
• Then, fit your model on a train set using fit() and
perform prediction on the test set using predict().
• We got a classification rate of 58.65% using TF-IDF
features, which is not considered as good accuracy. We
need to improve the accuracy by using some other
preprocessing or feature engineering. Let's suggest in
comment box some approach for accuracy
22
improvement.
Sentiment Analysis
• As a data analyst, It is more important to • We have learned data pre-processing using
understand our sentiments, what it really NLTK. Now, we learn Text Classification. We
means? perform Multi-Nomial Naive Bayes
Classification using scikit-learn.
• There are mainly two approaches for
performing sentiment analysis. • In the model building part, we can use the
"Sentiment Analysis of Movie, Reviews"
• Lexicon-based: count number of positive dataset available on Kaggle.
and negative words in given text and the larger
count will be the sentiment of text. • The dataset is a tab-separated file. Dataset
has four columns PhraseId, SentenceId,
• Machine learning based approach: Phrase, and Sentiment. This data has 5
Develop a classification model, which is trained sentiment labels:
using the pre-labeled dataset of positive,
negative, and neutral.
• In this lecture, we use the second approach
(Machine learning based approach).
• This is how you learn sentiment and text 23
Reference and Resources
• Introduction to Machine Learning with Python A Guide for Data Scientists,
Andreas C. Müller and Sarah Guido, Copyright © 2017, O'Reilly.

• https://learning.oreilly.com/library/view/machine-learning-with/
9781491989371/ch06.html

• https://www.datacamp.com/community/tutorials/text-analytics-beginners-nltk

• Neural Network Projects with Python by James Loy Published by Packt

Publishing, 2019.

IoT Lab Manual - VTU (21EC581) by Raviteja Balekai
100% (2)
IoT Lab Manual - VTU (21EC581) by Raviteja Balekai
284 pages
Vyzex Floor POD Plus Pilot's Guide PDF
No ratings yet
Vyzex Floor POD Plus Pilot's Guide PDF
20 pages
1DevelopingDSS SpreadSheet Ebook PDF
No ratings yet
1DevelopingDSS SpreadSheet Ebook PDF
1,398 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
2. NLP Pipeline
No ratings yet
2. NLP Pipeline
50 pages
Natural Language Processing manual
No ratings yet
Natural Language Processing manual
39 pages
4.TWITTER EXTRACTION AND ANALYTICS
No ratings yet
4.TWITTER EXTRACTION AND ANALYTICS
45 pages
NLP - 1_250119_222702 (1)
No ratings yet
NLP - 1_250119_222702 (1)
71 pages
Chapter 7.1 - Introducing Natural Language Processing
No ratings yet
Chapter 7.1 - Introducing Natural Language Processing
39 pages
NLP - Srilakshmi H - PPT Assignment
No ratings yet
NLP - Srilakshmi H - PPT Assignment
29 pages
Text Mining and Dataset Creation in Python
No ratings yet
Text Mining and Dataset Creation in Python
13 pages
Unraveling The Power of Natural Language Processing
No ratings yet
Unraveling The Power of Natural Language Processing
11 pages
NLP_course-EDC-1-29
No ratings yet
NLP_course-EDC-1-29
29 pages
NLP Manual
No ratings yet
NLP Manual
15 pages
APznzaaezhN_zrfGNBIVQoFpyxQuDJEbpYM-rd1_4RK0dsKNoyaIK1leg5AOwJTuo35Fm7my_JrMLHTTwQc2-C9HancQl3eg5PMXqg3GVh...P8BhsI_jQJy5fp8rf5U6yKHXRfFB-0sfyXvsKcrtjBjLcU1flNWbsLeC886utDYCdlHaYbVGoX44N_s9IQDFZVmSS9erIHdWuLbw1xo7dFCD-1IOTfC4GfUw8x
No ratings yet
APznzaaezhN_zrfGNBIVQoFpyxQuDJEbpYM-rd1_4RK0dsKNoyaIK1leg5AOwJTuo35Fm7my_JrMLHTTwQc2-C9HancQl3eg5PMXqg3GVh...P8BhsI_jQJy5fp8rf5U6yKHXRfFB-0sfyXvsKcrtjBjLcU1flNWbsLeC886utDYCdlHaYbVGoX44N_s9IQDFZVmSS9erIHdWuLbw1xo7dFCD-1IOTfC4GfUw8x
171 pages
NLP Lab Manual-1
No ratings yet
NLP Lab Manual-1
18 pages
Natural Language Toolkit NLTK PDF
No ratings yet
Natural Language Toolkit NLTK PDF
23 pages
AP for NLP-LO1
No ratings yet
AP for NLP-LO1
61 pages
A Tutorial On: Linguistic Data Analysis
No ratings yet
A Tutorial On: Linguistic Data Analysis
99 pages
AI Zone: Log in Sign Up
No ratings yet
AI Zone: Log in Sign Up
24 pages
AP for NLP-Word 2 Vec
No ratings yet
AP for NLP-Word 2 Vec
33 pages
Text Analytics Basics
No ratings yet
Text Analytics Basics
28 pages
Natural Language Processing Dossier 20231110 141736 0000
No ratings yet
Natural Language Processing Dossier 20231110 141736 0000
114 pages
Minorproject Ishant
No ratings yet
Minorproject Ishant
18 pages
Understanding Language Model
No ratings yet
Understanding Language Model
5 pages
NLP_Preprocessing_Steps__1740444240
No ratings yet
NLP_Preprocessing_Steps__1740444240
20 pages
Text Preprocessing Stages
No ratings yet
Text Preprocessing Stages
8 pages
Natural Language Processing
No ratings yet
Natural Language Processing
12 pages
Week 8-Module 7 NLP
No ratings yet
Week 8-Module 7 NLP
52 pages
NLP Manual (1-12) 1
No ratings yet
NLP Manual (1-12) 1
56 pages
Natural Language Processing_NOTES
No ratings yet
Natural Language Processing_NOTES
4 pages
Chapter-1 Introduction To NLP
No ratings yet
Chapter-1 Introduction To NLP
12 pages
Great Big Natural Language Processing Primer KDnuggets
No ratings yet
Great Big Natural Language Processing Primer KDnuggets
25 pages
Natural Language Processing tools and approaches
No ratings yet
Natural Language Processing tools and approaches
106 pages
Text-Processing-For-NLP-Text-Processing (6)
No ratings yet
Text-Processing-For-NLP-Text-Processing (6)
15 pages
Intro To NLP: Natural Language Toolkit
No ratings yet
Intro To NLP: Natural Language Toolkit
11 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
55 pages
Unit 1 NLP and TA
No ratings yet
Unit 1 NLP and TA
9 pages
Introduction To NLP
No ratings yet
Introduction To NLP
50 pages
Web and Social Media Analytics Lab
No ratings yet
Web and Social Media Analytics Lab
34 pages
Unit2 Full
No ratings yet
Unit2 Full
28 pages
NLPAssignment Purna
No ratings yet
NLPAssignment Purna
12 pages
Session2 3
No ratings yet
Session2 3
18 pages
NLP unit1
No ratings yet
NLP unit1
24 pages
PPT for Assignment-10 (Machine Learning With Python_NLP-2)
No ratings yet
PPT for Assignment-10 (Machine Learning With Python_NLP-2)
37 pages
NLP FINAL
No ratings yet
NLP FINAL
33 pages
Sentiment Analysis Using Supervised Machine Learning Ijariie13051
No ratings yet
Sentiment Analysis Using Supervised Machine Learning Ijariie13051
7 pages
Ram Chandra Padwal - Pratical Guide To NLTK For Data Science
No ratings yet
Ram Chandra Padwal - Pratical Guide To NLTK For Data Science
37 pages
Unit 1a
No ratings yet
Unit 1a
53 pages
Raymond S. T. Lee - Natural Language Processing. A Textbook With Python Implementation-Springer (2024)
No ratings yet
Raymond S. T. Lee - Natural Language Processing. A Textbook With Python Implementation-Springer (2024)
454 pages
Sample
No ratings yet
Sample
8 pages
NLP Preprocessing Steps
No ratings yet
NLP Preprocessing Steps
20 pages
Session 11-12 - Text Analytics
No ratings yet
Session 11-12 - Text Analytics
38 pages
text classification reseach paper
No ratings yet
text classification reseach paper
4 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
17 pages
SMA (TASK1 AND 2) ... HARDCOPY (Final) ..Pranchal..
No ratings yet
SMA (TASK1 AND 2) ... HARDCOPY (Final) ..Pranchal..
11 pages
Python and NLP Notes
No ratings yet
Python and NLP Notes
32 pages
NLP_record300
No ratings yet
NLP_record300
24 pages
Natural Language Processing With Python
100% (1)
Natural Language Processing With Python
504 pages
Unit V Natural Language Processing
No ratings yet
Unit V Natural Language Processing
20 pages
NLP Record
No ratings yet
NLP Record
15 pages
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
From Everand
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
Alexandra George
No ratings yet
Python For Data Science
From Everand
Python For Data Science
Kevin Clark
No ratings yet
Detecting The Fault From Spectrograms by Using Genetic Algorithm Techniques
No ratings yet
Detecting The Fault From Spectrograms by Using Genetic Algorithm Techniques
10 pages
HTML PPT Syllabus
No ratings yet
HTML PPT Syllabus
169 pages
Client Scripts ServiceNow 1702754037
100% (1)
Client Scripts ServiceNow 1702754037
6 pages
WhatsApp Chat With 1 Minute Ka Madrasa
No ratings yet
WhatsApp Chat With 1 Minute Ka Madrasa
13 pages
Cómo Instalar Iptv Smarters App en Fire TV Stick
No ratings yet
Cómo Instalar Iptv Smarters App en Fire TV Stick
3 pages
SDP_Architecture_Guide_Web
No ratings yet
SDP_Architecture_Guide_Web
39 pages
"Engagement of Junior Teacher (Schematic) 2023".: - Opening Date For On-Line Registration of Application 13.09.2023
No ratings yet
"Engagement of Junior Teacher (Schematic) 2023".: - Opening Date For On-Line Registration of Application 13.09.2023
4 pages
1 - Templatesyllabus
No ratings yet
1 - Templatesyllabus
18 pages
DAA-Lab Manual-NBA 2023 - Geeta Rani
No ratings yet
DAA-Lab Manual-NBA 2023 - Geeta Rani
22 pages
[Urban Book Series ] Poonam Sharma (Eds.) - Geospatial Technology and Smart Cities_ ICT, Geoscience Modeling, GIS and Remote Sensing (2021, Springer) [10.1007_978!3!030-71945-6] - Libgen.li
No ratings yet
[Urban Book Series ] Poonam Sharma (Eds.) - Geospatial Technology and Smart Cities_ ICT, Geoscience Modeling, GIS and Remote Sensing (2021, Springer) [10.1007_978!3!030-71945-6] - Libgen.li
499 pages
Manalo 04 Task Performance 1 6 2 1
No ratings yet
Manalo 04 Task Performance 1 6 2 1
4 pages
DWM Unit-I Notes
No ratings yet
DWM Unit-I Notes
10 pages
Week 5 HTTP Path Traversal
No ratings yet
Week 5 HTTP Path Traversal
11 pages
Download Complete Trustworthy Ubiquitous Computing 1st Edition Karin Bee PDF for All Chapters
100% (3)
Download Complete Trustworthy Ubiquitous Computing 1st Edition Karin Bee PDF for All Chapters
81 pages
Aws + Azure + GCP - Devops Course Content V1.2
No ratings yet
Aws + Azure + GCP - Devops Course Content V1.2
8 pages
Analytic Geometry Problems
No ratings yet
Analytic Geometry Problems
4 pages
Landing Zone Reference Architecture
No ratings yet
Landing Zone Reference Architecture
3 pages
Learning Module No. 03 Lesson 1: Digital and Non-Digital Resources
No ratings yet
Learning Module No. 03 Lesson 1: Digital and Non-Digital Resources
8 pages
Setting Parameter in Instance Profile (RZ10 Transaction) To Set The Required Parameter in Your Instance Profile
No ratings yet
Setting Parameter in Instance Profile (RZ10 Transaction) To Set The Required Parameter in Your Instance Profile
5 pages
Unit - 3 Unit - 3 Data Base Managemant Data Base Managemant
No ratings yet
Unit - 3 Unit - 3 Data Base Managemant Data Base Managemant
20 pages
PI Interface To The Baily System
No ratings yet
PI Interface To The Baily System
73 pages
Week 1 - LECTURE - 1 - INTRO GIS - 21032024
No ratings yet
Week 1 - LECTURE - 1 - INTRO GIS - 21032024
20 pages
Futuristic Research Trends and Applications of Internet of Things 1st Edition Bhawana Rudra - The ebook is ready for download to explore the complete content
100% (1)
Futuristic Research Trends and Applications of Internet of Things 1st Edition Bhawana Rudra - The ebook is ready for download to explore the complete content
73 pages
Admissible Search
No ratings yet
Admissible Search
49 pages
Image Steganography Embedded With Advance Encryption Standard (AES) Securing With SHA-256
No ratings yet
Image Steganography Embedded With Advance Encryption Standard (AES) Securing With SHA-256
8 pages
Css9 Las Firstquarter Week5
No ratings yet
Css9 Las Firstquarter Week5
36 pages
E2 Smart Band PDF
No ratings yet
E2 Smart Band PDF
10 pages

Lecture 8 - Text Analytics NLP

Uploaded by

Lecture 8 - Text Analytics NLP

Uploaded by

Machine Learning

Text Analytics and NLP

Lecturer: Dr. Muhammad Iqbal *

©CCT College Dublin 2022

• Solution: Use NLTK’s stopwords:

• While “stop words” can refer to any set of

• NLTK has a list of common stop words that we

• Problem: You have text data and want

• Example: There are three documents:

• Neural Network Projects with Python by James Loy Published by Packt

You might also like