module-3
module-3
Introduction to Natural
Language
Processing
7-04-2025
Course
Objectives:
The objective of this course is to:
To enrich the algorithmic knowledge on the application of various syntactic and semantic parsing in NLP
process.
To grab the strong knowledge on the natural language generation in NLP process
7-04-2025
Module -3 Syntactic Analysis (POS Tagging & Chunking)
POS tagging using NLP libraries (NLTK, spaCy) Chunking on text data.
7-04-2025
Introduction to Syntactic Parsing
formal grammar, breaking down text into its constituent parts and identifying grammatical relationships.
Constituency Parsing
Dependency Parsing
7-04-2025
Difference between Constituency Parsing and Dependency
07-04-2025
07-04-2025
Why Do We Need Syntactic Analysis?
Parsing and Understanding: It helps computers parse sentences and understand their grammatical structure,
allowing them to distinguish subjects from objects, identify verb predicates, and recognize modifiers. This
understanding is crucial for various NLP tasks like machine translation and sentiment analysis.
Ambiguity Resolution: Natural language often contains ambiguities that humans effortlessly resolve through
context and syntax. Syntactic analysis aids in resolving such ambiguities by providing a structured framework for
interpretation.
Grammar and Language Generation: Syntax is essential not only for comprehension but also for language generation.
It guides the generation of coherent and grammatically correct sentences, making chatbots and text generators more
effective.
Information Extraction: In applications like information retrieval and question answering, identifying syntactic
patterns and relationships in text is essential for extracting relevant information from unstructured data.
07-04-2025
Types of Syntactic Structures
Syntactic structures encompass various elements, including:
Phrases: These are groups of words that function together as a single unit within a sentence. Common terms
include noun phrases (NP) and verb phrases (VP).
Clauses: Clauses are larger units of language that consist of a subject and a predicate. They can be independent (main
clauses) or dependent (subordinate clauses).
Dependency Relations: In dependency grammar, words in a sentence are linked by dependency relations, showing
how they depend on each other. These relations help capture the syntactic relationships between words.
Constituency Relations: Constituency grammar breaks sentences into constituents, such as noun phrases and verb
phrases, which represent the syntactic structure of a sentence.
07-04-2025
Example of Syntactic Analysis in NLP
Syntactic analysis in NLP involves parsing a sentence to understand its grammatical structure. Here’s an
example:
Sentence: “The quick brown fox jumps over the lazy dog.”
Tokenization: The first step is to tokenize the sentence, breaking it down into individual words:
“The” | “quick” | “brown” | “fox” | “jumps” | “over” | “the” | “lazy” | “dog” | “.”
Part-of-Speech Tagging: Next, part-of-speech tags are assigned to each word to identify its grammatical
role:
“The” (Article) | “quick” (Adjective) | “brown” (Adjective) | “fox” (Noun) | “jumps” (Verb) | “over”
(Preposition) | “the” (Article) | “lazy” (Adjective) | “dog” (Noun) | “.” (Punctuation)
07-04-2025
Example of Syntactic Analysis in NLP
Dependency Parsing: Syntactic analysis involves creating a parse tree or dependency graph to show the
relationships between words. Here’s a simplified representation of the dependency structure:
This syntactic analysis helps the NLP system understand the grammatical relationships within the sentence,
which can be valuable for various NLP tasks, such as information extraction, sentiment analysis, and machine
translation.
07-04-2025
Part-of-Speech (POS) Tagging
Part-of-Speech (POS) tagging is a fundamental task in Natural Language Processing (NLP) that involves
assigning a grammatical category (such as noun, verb, adjective, etc.) to each word in a sentence.
The goal is to understand the syntactic structure of a sentence and identify the grammatical roles of individual words.
POS tagging provides essential information for various NLP applications, including text analysis, machine translation,
and information retrieval.
Key Concepts:
1. POS Tags:
• POS tags are short codes representing specific parts of speech. Common POS tags include:
• Noun (NN)
• Verb (VB)
• Adjective (JJ)
• Adverb (RB)
• Pronoun (PRP)
• Preposition (IN)
• Conjunction (CC)
• Determiner (DT)
• Interjection (UH)
07-04-2025
2.Tag Sets:
• Different tag sets may be used depending on the POS tagging system or language. For example, the Penn
Treebank POS tag set is widely used in English NLP tasks.
3.Ambiguity:
• Words may have multiple possible POS tags based on context. For example, “lead” can be a noun (the
metal) or a verb (to guide).
example,
Text: “The cat sat on the mat.”
POS tags:
• The: determiner
• cat: noun
• sat: verb
• on: preposition
• the: determiner
• mat: noun
07-04-2025 15
Methods of POS Tagging
1. Rule-Based Tagging:
Based on handcrafted rules that consider word morphology, context, and syntactic information. It
can be effective but may struggle with ambiguity.
2. Statistical Tagging:
Uses statistical models trained on large annotated corpora to predict POS tags. Hidden Markov
Models (HMMs) and Conditional Random Fields (CRFs) are common statistical approaches.
07-04-2025 16
Rule Based POS Tagging
Rule-based part-of-speech (POS) tagging is a method of labeling words with their corresponding
parts of speech using a set of pre-defined rules.
In a rule-based POS tagging system, words are assigned POS tags based on their characteristics
and the context in which they appear. For example, a rule-based POS tagger might assign the tag
“noun” to any word that ends in “-tion” or “-ment,” as these suffixes are often used to form nouns.
Example:
• If the word ends in “-tion,” assign the tag “noun.”
• If the word ends in “-ment,” assign the tag “noun.”
• If the word is all uppercase, assign the tag “proper noun.”
• If the word is a verb ending in “-ing,” assign the tag “verb.”
07-04-2025
Iterate through the words in the text and apply the rules to each word in turn. For
example:
• “Nation” would be tagged as “noun” based on the first rule.
• “Investment” would be tagged as “noun” based on the second rule.
• “UNITED” would be tagged as “proper noun” based on the third rule.
• “Running” would be tagged as “verb” based on the fourth rule.
Statistical POS Tagging
Statistical part-of-speech (POS) tagging is a method of labeling words with their corresponding
parts of speech using statistical techniques. This is in contrast to rule-based POS tagging, which
relies on pre-defined rules , and to unsupervised learning-based POS tagging, which does not use
any annotated training data.
In statistical POS tagging, a model is trained on a large annotated corpus of text to learn the
patterns and characteristics of different parts of speech. The model uses this training data to predict
the POS tag of a given word based on the context in which it appears and the probability of
different POS tags occurring in that context.
07-04-2025
4. Evaluate the performance of the model by comparing the predicted tags to the true tags in the
testing data and calculating metrics such as precision and recall.
5. Fine-tune the model and repeat the process until the desired level of accuracy is achieved.
6. Use the trained model to perform POS tagging on new, unseen text.
Importance and Applications
Syntactic Analysis: POS tagging is crucial for understanding the grammatical structure of a sentence,
enabling syntactic analysis. It helps identify the subject, verb, object, and other syntactic elements.
Semantic Analysis: POS tags contribute to understanding the meaning of words in context. For
example, distinguishing between a noun and a verb can significantly impact the interpretation of a
sentence.
Information Retrieval: POS tagging is used in information retrieval systems to improve the
precision and relevance of search results. For instance, searching for “NN” (noun) in a document can
prioritize nouns over other words.
Named Entity Recognition (NER):POS tags play a role in named entity recognition by providing
information about the grammatical category of words. For example, recognizing that “New York”
is a proper noun.
Machine Translation: POS tagging is essential in machine translation to ensure accurate
translation based on the grammatical structure of sentences.
07-04-2025
Chunking: Shallow Parsing and Phrase Extraction
Shallow parsing, also known as chunking, is a type of natural language processing (NLP) technique
that aims to identify and extract meaningful phrases or chunks from a sentence.
Shallow parsing focuses on identifying individual phrases or constituents, such as noun phrases, verb
phrases, and prepositional phrases.
Full parsing involves analyzing the entire grammatical structure of a sentence, which can be
computationally intensive and time-consuming. Shallow parsing, on the other hand, involves
identifying and extracting only the most important phrases or constituents, making it faster and more
efficient than full parsing.
Key Steps in Shallow parsing
The first step is sentence segmentation, where a sentence is divided into individual words or
tokens.
The next step is part-of-speech tagging, where each token is assigned a grammatical category, such
as noun, verb, or adjective.
Step is to identify and extract the relevant phrases or constituents from the sentence. This is
typically done using pattern matching or machine learning algorithms that have been trained to
recognize specific types of phrases or constituents.
07-04-2025
Difference between Full Parsing vs. Shallow Parsing
Phrase Structure and Chunks
A phrase is a group of words that work together as a unit in a sentence. Each phrase
has a head word (main word) and possibly other modifiers. Phrases don’t necessarily
form a complete sentence on their own.
In chunking (shallow parsing), we group these phrases into "chunks", which are flat,
non-overlapping sequences of words representing partial syntactic structure.
Another common type of shallow parsing is verb phrase chunking, which involves
identifying and extracting all the verb phrases in a sentence. Verb phrases typically
consist of a verb and any associated adverbs, particles, or complements. For example,
in the sentence “She sings beautifully,” the verb phrase “sings beautifully” can be
identified and extracted using verb phrase chunking.
07-04-2025
Types of Phrases (Chunks)
Noun phrase chunking, which involves identifying and extracting all the noun phrases in
a sentence. Noun phrases typically consist of a noun and any associated adjectives,
determiners, or modifiers. For example, in the sentence “The black cat sat on the mat,”
the noun phrase “the black cat” can be identified and extracted using noun phrase chunk
Verb phrase chunking, which involves identifying and extracting all the verb phrases in a
sentence. Verb phrases typically consist of a verb and any associated adverbs, particles,
or complements. For example, in the sentence “She sings beautifully,” the verb phrase
“sings beautifully” can be identified and extracted using verb phrase chunking.
07-04-2025
Types of Phrases (Chunks)
An Adverb Phrase chunking (ADVP) is a group of words that work together to modify or
describe a verb, adjective, or another adverb. The main word in the phrase is an adverb,
and it may be surrounded by modifiers or complements. Example "very quickly “,
Modifier: very, adverb (head): quickly, She ran [ADVP very quickly] to catch the bus.
07-04-2025
Example of Chunking Sentence:
“The quick brown fox jumps over the lazy dog.”
POS Tags: DT JJ JJ NN VBZ IN DT JJ NN
Chunk: [NP The quick brown fox] [VP jumps] [PP over] [NP the lazy dog]
Phrase Extraction
Phrase Extraction is the process of identifying and retrieving specific meaningful phrases
—like noun phrases (NP) or verb phrases (VP)—from text that has been chunked
(shallow parsed). It’s a common step in many NLP pipelines when you want key pieces
of information without doing full parsing.
How Phrase Extraction Works
1. Tokenize and POS-tag the sentence
2. Apply a chunking pattern (e.g., NP: {<DT>?<JJ>*<NN>+})
3. Extract the chunks (subtrees labeled "NP", "VP", etc.)
4. Return the chunked phrases as strings or spans
07-04-2025
Extracting Noun/Verb Phrases from Chunked Output
After chunking a sentence, we typically get a tree-like structure or tagged sequence (like
using BIO tags or regex-based chunking). From this, we extract phrases like:Noun
Phrases (NP) → Entities, subjects, objectsVerb Phrases (VP) → Actions, events
Example : “The talented musician played a beautiful melody.”
After chunking: [NP The talented musician] [VP played] [NP a beautiful melody]
Extract:Noun Phrases:
"The talented musician“
"a beautiful melody“
Verb Phrase:"played"
07-04-2025
Applications of Phrase Extraction
1. Search Engines
Extracts key phrases to better index and match user queries with content. Improves autocomplete
and suggested search features.
2. Information
Retrieval Pulls out relevant entities or concepts from documents (like names, events, topics). Enhances
semantic search by focusing on meaningful units.
3. Text Summarization
• Uses noun/verb phrases as building blocks for concise summaries.
• Helps identify important actions and entities.
Chunking is a powerful NLP technique that enhances POS tagging by grouping words
into meaningful phrases.
07-04-2025
Challenges in Chunking
07-04-2025 *
Dependency Parsing: The Basics
Dependency Parsing is a process in Natural Language Processing (NLP) that analyzes the
grammatical structure of a sentence by identifying relationships (dependencies) between words.
Instead of building hierarchical phrase structures (like in constituency parsing), it represents a
sentence as a directed graph, where: Each word is a node. Edges (arrows) represent grammatical
relationships between head words and their dependents.
07-04-2025
Dependency Relations:
sat → root
cat → nsubj (nominal subject of “sat”)
on → prep (preposition modifying “sat”)
mat → pobj (object of the preposition “on”)
the (modifies “cat” and “mat”) → det (determiner)