Python | PoS Tagging and Lemmatization using spaCy
Last Updated :
29 Mar, 2019
spaCy is one of the best text analysis library. spaCy excels at large-scale information extraction tasks and is one of the fastest in the world. It is also the best way to prepare text for deep learning. spaCy is much faster and accurate than NLTKTagger and TextBlob.
How to Install ?
pip install spacy
python -m spacy download en_core_web_sm
Top Features of spaCy:
1. Non-destructive tokenization
2. Named entity recognition
3. Support for 49+ languages
4. 16 statistical models for 9 languages
5. Pre-trained word vectors
6. Part-of-speech tagging
7. Labeled dependency parsing
8. Syntax-driven sentence segmentation
Import and Load Library:
Python3
import spacy
# python -m spacy download en_core_web_sm
nlp = spacy.load("en_core_web_sm")
POS-Tagging for Reviews:
It is a method of identifying words as nouns, verbs, adjectives, adverbs, etc.
Python3
import spacy
# Load English tokenizer, tagger,
# parser, NER and word vectors
nlp = spacy.load("en_core_web_sm")
# Process whole documents
text = ("""My name is Shaurya Uppal.
I enjoy writing articles on GeeksforGeeks checkout
my other article by going to my profile section.""")
doc = nlp(text)
# Token and Tag
for token in doc:
print(token, token.pos_)
# You want list of Verb tokens
print("Verbs:", [token.text for token in doc if token.pos_ == "VERB"])
Output:
My DET
name NOUN
is VERB
Shaurya PROPN
Uppal PROPN
. PUNCT
I PRON
enjoy VERB
writing VERB
articles NOUN
on ADP
GeeksforGeeks PROPN
checkout VERB
my DET
other ADJ
article NOUN
by ADP
going VERB
to ADP
my DET
profile NOUN
section NOUN
. PUNCT
# Verb based Tagged Reviews:-
Verbs: ['is', 'enjoy', 'writing', 'checkout', 'going']
Lemmatization:
It is a process of grouping together the inflected forms of a word so they can be analyzed as a single item, identified by the word's lemma, or dictionary form.
Python3
import spacy
# Load English tokenizer, tagger,
# parser, NER and word vectors
nlp = spacy.load("en_core_web_sm")
# Process whole documents
text = ("""My name is Shaurya Uppal. I enjoy writing
articles on GeeksforGeeks checkout my other
article by going to my profile section.""")
doc = nlp(text)
for token in doc:
print(token, token.lemma_)
Output:
My -PRON-
name name
is be
Shaurya Shaurya
Uppal Uppal
. .
I -PRON-
enjoy enjoy
writing write
articles article
on on
GeeksforGeeks GeeksforGeeks
checkout checkout
my -PRON-
other other
article article
by by
going go
to to
my -PRON-
profile profile
section section
. .
Similar Reads
Python | Lemmatization with NLTK Lemmatization is an important text pre-processing technique in Natural Language Processing (NLP) that reduces words to their base form known as a "lemma." For example, the lemma of "running" is "run" and "better" becomes "good." Unlike stemming which simply removes prefixes or suffixes, it considers
6 min read
Python | Lemmatization with NLTK Lemmatization is an important text pre-processing technique in Natural Language Processing (NLP) that reduces words to their base form known as a "lemma." For example, the lemma of "running" is "run" and "better" becomes "good." Unlike stemming which simply removes prefixes or suffixes, it considers
6 min read
Python | Lemmatization with TextBlob Lemmatization is the process of grouping together the different inflected forms of a word so they can be analyzed as a single item. Lemmatization is similar to stemming but it brings context to the words. So it links words with similar meanings to one word.Text preprocessing includes both Stemming a
2 min read
Python | Lemmatization with TextBlob Lemmatization is the process of grouping together the different inflected forms of a word so they can be analyzed as a single item. Lemmatization is similar to stemming but it brings context to the words. So it links words with similar meanings to one word.Text preprocessing includes both Stemming a
2 min read
POS(Parts-Of-Speech) Tagging in NLP Parts of Speech (PoS) tagging is a core task in NLP, It gives each word a grammatical category such as nouns, verbs, adjectives and adverbs. Through better understanding of phrase structure and semantics, this technique makes it possible for machines to study human language more accurately. PoS tagg
7 min read
POS(Parts-Of-Speech) Tagging in NLP Parts of Speech (PoS) tagging is a core task in NLP, It gives each word a grammatical category such as nouns, verbs, adjectives and adverbs. Through better understanding of phrase structure and semantics, this technique makes it possible for machines to study human language more accurately. PoS tagg
7 min read