0% found this document useful (0 votes)
14 views33 pages

module-3

The document outlines a course on Introduction to Natural Language Processing, focusing on syntactic analysis, including POS tagging and chunking. It details various methods of syntactic parsing, the importance of POS tagging in understanding grammatical structures, and applications such as information retrieval and machine translation. Additionally, it discusses shallow parsing techniques for phrase extraction and their significance in NLP tasks.

Uploaded by

shanmukh899
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views33 pages

module-3

The document outlines a course on Introduction to Natural Language Processing, focusing on syntactic analysis, including POS tagging and chunking. It details various methods of syntactic parsing, the importance of POS tagging in understanding grammatical structures, and applications such as information retrieval and machine translation. Additionally, it discusses shallow parsing techniques for phrase extraction and their significance in NLP tasks.

Uploaded by

shanmukh899
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Course Name:

Introduction to Natural
Language
Processing

7-04-2025
Course
Objectives:
The objective of this course is to:

 To gain the basic knowledge on Natural Language Processing.

 Apply text cleaning, Morphological and Lexical analysis on text data.

 To enrich the algorithmic knowledge on the application of various syntactic and semantic parsing in NLP

process.

 To grab the strong knowledge on the natural language generation in NLP process

7-04-2025
Module -3 Syntactic Analysis (POS Tagging & Chunking)

 Introduction to Syntactic Parsing

 Part-of-Speech (POS) Tagging: Rule-based and Statistical Approaches

 Chunking: Shallow Parsing and Phrase Extraction

 Dependency Parsing Basic

 POS tagging using NLP libraries (NLTK, spaCy) Chunking on text data.

7-04-2025
Introduction to Syntactic Parsing

7-04-2025 Introduction to Natural Language 4


Processing
Syntactic parsing
 Syntactic parsing is the process of analyzing the structure of sentences according to the rules of a

formal grammar, breaking down text into its constituent parts and identifying grammatical relationships.

 Syntactic parsing can be broadly categorized into two methods

 Constituency Parsing

 Dependency Parsing

 Constituency parsing focuses on syntactic analysis

 Dependency parsing can handle both syntactic and semantic analysis

7-04-2025
Difference between Constituency Parsing and Dependency

07-04-2025
07-04-2025
 Why Do We Need Syntactic Analysis?

 Parsing and Understanding: It helps computers parse sentences and understand their grammatical structure,
allowing them to distinguish subjects from objects, identify verb predicates, and recognize modifiers. This
understanding is crucial for various NLP tasks like machine translation and sentiment analysis.

 Ambiguity Resolution: Natural language often contains ambiguities that humans effortlessly resolve through
context and syntax. Syntactic analysis aids in resolving such ambiguities by providing a structured framework for
interpretation.

 Grammar and Language Generation: Syntax is essential not only for comprehension but also for language generation.
It guides the generation of coherent and grammatically correct sentences, making chatbots and text generators more
effective.

 Information Extraction: In applications like information retrieval and question answering, identifying syntactic
patterns and relationships in text is essential for extracting relevant information from unstructured data.

07-04-2025
Types of Syntactic Structures
 Syntactic structures encompass various elements, including:

 Phrases: These are groups of words that function together as a single unit within a sentence. Common terms
include noun phrases (NP) and verb phrases (VP).
 Clauses: Clauses are larger units of language that consist of a subject and a predicate. They can be independent (main
clauses) or dependent (subordinate clauses).
 Dependency Relations: In dependency grammar, words in a sentence are linked by dependency relations, showing
how they depend on each other. These relations help capture the syntactic relationships between words.
 Constituency Relations: Constituency grammar breaks sentences into constituents, such as noun phrases and verb
phrases, which represent the syntactic structure of a sentence.

07-04-2025
Example of Syntactic Analysis in NLP
 Syntactic analysis in NLP involves parsing a sentence to understand its grammatical structure. Here’s an
example:

Sentence: “The quick brown fox jumps over the lazy dog.”

 Tokenization: The first step is to tokenize the sentence, breaking it down into individual words:

“The” | “quick” | “brown” | “fox” | “jumps” | “over” | “the” | “lazy” | “dog” | “.”

 Part-of-Speech Tagging: Next, part-of-speech tags are assigned to each word to identify its grammatical
role:

“The” (Article) | “quick” (Adjective) | “brown” (Adjective) | “fox” (Noun) | “jumps” (Verb) | “over”
(Preposition) | “the” (Article) | “lazy” (Adjective) | “dog” (Noun) | “.” (Punctuation)

07-04-2025
Example of Syntactic Analysis in NLP
 Dependency Parsing: Syntactic analysis involves creating a parse tree or dependency graph to show the
relationships between words. Here’s a simplified representation of the dependency structure:

 In this dependency parse tree:


“jumps” is the main verb, and “fox” is the sentence’s subject.
“fox” and “dog” are nouns, and “quick” and “lazy” are adjectives modifying them.
“over” is a preposition that connects “jumps” and “dog.”

This syntactic analysis helps the NLP system understand the grammatical relationships within the sentence,
which can be valuable for various NLP tasks, such as information extraction, sentiment analysis, and machine
translation.
07-04-2025
Part-of-Speech (POS) Tagging
 Part-of-Speech (POS) tagging is a fundamental task in Natural Language Processing (NLP) that involves
assigning a grammatical category (such as noun, verb, adjective, etc.) to each word in a sentence.
 The goal is to understand the syntactic structure of a sentence and identify the grammatical roles of individual words.
 POS tagging provides essential information for various NLP applications, including text analysis, machine translation,
and information retrieval.

Key Concepts:
1. POS Tags:
• POS tags are short codes representing specific parts of speech. Common POS tags include:
• Noun (NN)
• Verb (VB)
• Adjective (JJ)
• Adverb (RB)
• Pronoun (PRP)
• Preposition (IN)
• Conjunction (CC)
• Determiner (DT)
• Interjection (UH)
07-04-2025
2.Tag Sets:
• Different tag sets may be used depending on the POS tagging system or language. For example, the Penn
Treebank POS tag set is widely used in English NLP tasks.
3.Ambiguity:
• Words may have multiple possible POS tags based on context. For example, “lead” can be a noun (the
metal) or a verb (to guide).
example,
Text: “The cat sat on the mat.”
POS tags:
• The: determiner
• cat: noun
• sat: verb
• on: preposition
• the: determiner
• mat: noun

07-04-2025 15
Methods of POS Tagging

1. Rule-Based Tagging:
Based on handcrafted rules that consider word morphology, context, and syntactic information. It
can be effective but may struggle with ambiguity.

2. Statistical Tagging:
Uses statistical models trained on large annotated corpora to predict POS tags. Hidden Markov
Models (HMMs) and Conditional Random Fields (CRFs) are common statistical approaches.

3.Machine Learning-Based Tagging:


Utilizes machine learning algorithms such as decision trees, support vector machines, or neural
networks to learn patterns from data. Particular emphasis is given to contextual information.

4.Deep Learning-Based Tagging:


Deep learning models, such as recurrent neural networks (RNNs) and long short-term memory
networks (LSTMs), are employed for POS tagging, capturing complex contextual dependencies.

07-04-2025 16
Rule Based POS Tagging
 Rule-based part-of-speech (POS) tagging is a method of labeling words with their corresponding
parts of speech using a set of pre-defined rules.
 In a rule-based POS tagging system, words are assigned POS tags based on their characteristics
and the context in which they appear. For example, a rule-based POS tagger might assign the tag
“noun” to any word that ends in “-tion” or “-ment,” as these suffixes are often used to form nouns.

Example:
• If the word ends in “-tion,” assign the tag “noun.”
• If the word ends in “-ment,” assign the tag “noun.”
• If the word is all uppercase, assign the tag “proper noun.”
• If the word is a verb ending in “-ing,” assign the tag “verb.”

07-04-2025
Iterate through the words in the text and apply the rules to each word in turn. For
example:
• “Nation” would be tagged as “noun” based on the first rule.
• “Investment” would be tagged as “noun” based on the second rule.
• “UNITED” would be tagged as “proper noun” based on the third rule.
• “Running” would be tagged as “verb” based on the fourth rule.
Statistical POS Tagging
 Statistical part-of-speech (POS) tagging is a method of labeling words with their corresponding
parts of speech using statistical techniques. This is in contrast to rule-based POS tagging, which
relies on pre-defined rules , and to unsupervised learning-based POS tagging, which does not use
any annotated training data.
 In statistical POS tagging, a model is trained on a large annotated corpus of text to learn the
patterns and characteristics of different parts of speech. The model uses this training data to predict
the POS tag of a given word based on the context in which it appears and the probability of
different POS tags occurring in that context.

Example of how a statistical POS tagger might work:


1. Collect a large annotated corpus of text and divide it into training and testing sets.
2. Train a statistical model on the training data, using techniques such as maximum likelihood
estimation or hidden Markov models.
3. Use the trained model to predict the POS tags of the words in the testing data.

07-04-2025
4. Evaluate the performance of the model by comparing the predicted tags to the true tags in the
testing data and calculating metrics such as precision and recall.
5. Fine-tune the model and repeat the process until the desired level of accuracy is achieved.
6. Use the trained model to perform POS tagging on new, unseen text.
Importance and Applications
 Syntactic Analysis: POS tagging is crucial for understanding the grammatical structure of a sentence,
enabling syntactic analysis. It helps identify the subject, verb, object, and other syntactic elements.
 Semantic Analysis: POS tags contribute to understanding the meaning of words in context. For
example, distinguishing between a noun and a verb can significantly impact the interpretation of a
sentence.
 Information Retrieval: POS tagging is used in information retrieval systems to improve the
precision and relevance of search results. For instance, searching for “NN” (noun) in a document can
prioritize nouns over other words.
 Named Entity Recognition (NER):POS tags play a role in named entity recognition by providing
information about the grammatical category of words. For example, recognizing that “New York”
is a proper noun.
 Machine Translation: POS tagging is essential in machine translation to ensure accurate
translation based on the grammatical structure of sentences.

07-04-2025
Chunking: Shallow Parsing and Phrase Extraction
 Shallow parsing, also known as chunking, is a type of natural language processing (NLP) technique
that aims to identify and extract meaningful phrases or chunks from a sentence.
 Shallow parsing focuses on identifying individual phrases or constituents, such as noun phrases, verb
phrases, and prepositional phrases.
 Full parsing involves analyzing the entire grammatical structure of a sentence, which can be
computationally intensive and time-consuming. Shallow parsing, on the other hand, involves
identifying and extracting only the most important phrases or constituents, making it faster and more
efficient than full parsing.
Key Steps in Shallow parsing
 The first step is sentence segmentation, where a sentence is divided into individual words or
tokens.
 The next step is part-of-speech tagging, where each token is assigned a grammatical category, such
as noun, verb, or adjective.
 Step is to identify and extract the relevant phrases or constituents from the sentence. This is
typically done using pattern matching or machine learning algorithms that have been trained to
recognize specific types of phrases or constituents.
07-04-2025
Difference between Full Parsing vs. Shallow Parsing
Phrase Structure and Chunks

A phrase is a group of words that work together as a unit in a sentence. Each phrase
has a head word (main word) and possibly other modifiers. Phrases don’t necessarily
form a complete sentence on their own.
In chunking (shallow parsing), we group these phrases into "chunks", which are flat,
non-overlapping sequences of words representing partial syntactic structure.
 Another common type of shallow parsing is verb phrase chunking, which involves
identifying and extracting all the verb phrases in a sentence. Verb phrases typically
consist of a verb and any associated adverbs, particles, or complements. For example,
in the sentence “She sings beautifully,” the verb phrase “sings beautifully” can be
identified and extracted using verb phrase chunking.

07-04-2025
Types of Phrases (Chunks)

Noun phrase chunking, which involves identifying and extracting all the noun phrases in
a sentence. Noun phrases typically consist of a noun and any associated adjectives,
determiners, or modifiers. For example, in the sentence “The black cat sat on the mat,”
the noun phrase “the black cat” can be identified and extracted using noun phrase chunk

Verb phrase chunking, which involves identifying and extracting all the verb phrases in a
sentence. Verb phrases typically consist of a verb and any associated adverbs, particles,
or complements. For example, in the sentence “She sings beautifully,” the verb phrase
“sings beautifully” can be identified and extracted using verb phrase chunking.

07-04-2025
Types of Phrases (Chunks)

Adjective Phrase chunking(ADJP) is a group of words built around an adjective that


describes or modifies a noun. The adjective is the head of the phrase, and it may be
accompanied by modifiers, like adverbs or determiners. Example "very beautiful“,
Adverb: very, Adjective (head): beautiful, Describes a noun like: "She is [ADJP very
beautiful].

An Adverb Phrase chunking (ADVP) is a group of words that work together to modify or
describe a verb, adjective, or another adverb. The main word in the phrase is an adverb,
and it may be surrounded by modifiers or complements. Example "very quickly “,
Modifier: very, adverb (head): quickly, She ran [ADVP very quickly] to catch the bus.

07-04-2025
Example of Chunking Sentence:
“The quick brown fox jumps over the lazy dog.”
POS Tags: DT JJ JJ NN VBZ IN DT JJ NN
Chunk: [NP The quick brown fox] [VP jumps] [PP over] [NP the lazy dog]
Phrase Extraction

Phrase Extraction is the process of identifying and retrieving specific meaningful phrases
—like noun phrases (NP) or verb phrases (VP)—from text that has been chunked
(shallow parsed). It’s a common step in many NLP pipelines when you want key pieces
of information without doing full parsing.
How Phrase Extraction Works
1. Tokenize and POS-tag the sentence
2. Apply a chunking pattern (e.g., NP: {<DT>?<JJ>*<NN>+})
3. Extract the chunks (subtrees labeled "NP", "VP", etc.)
4. Return the chunked phrases as strings or spans

07-04-2025
Extracting Noun/Verb Phrases from Chunked Output

After chunking a sentence, we typically get a tree-like structure or tagged sequence (like
using BIO tags or regex-based chunking). From this, we extract phrases like:Noun
Phrases (NP) → Entities, subjects, objectsVerb Phrases (VP) → Actions, events
Example : “The talented musician played a beautiful melody.”
After chunking: [NP The talented musician] [VP played] [NP a beautiful melody]
Extract:Noun Phrases:
"The talented musician“
"a beautiful melody“
Verb Phrase:"played"

07-04-2025
Applications of Phrase Extraction
1. Search Engines
Extracts key phrases to better index and match user queries with content. Improves autocomplete
and suggested search features.

2. Information
Retrieval Pulls out relevant entities or concepts from documents (like names, events, topics). Enhances
semantic search by focusing on meaningful units.

3. Text Summarization
• Uses noun/verb phrases as building blocks for concise summaries.
• Helps identify important actions and entities.

4. Named Entity Recognition (NER)


Preprocessing Noun phrases are often candidate entities for NER tasks.

5. Question Answering Systems


Extracts candidate answer phrases from a corpus.Matches question patterns to possible answer chunks.
07-04-2025
Why Chunking is Essential

Chunking is a powerful NLP technique that enhances POS tagging by grouping words
into meaningful phrases.

✅ It is essential for entity detection, information extraction, and text analysis.


✅ It allows rapid data processing without analyzing the entire text.
✅ It provides better phrase-level understanding than POS tagging alone.
✅ It is widely used in search engines, AI assistants, financial analysis, and e-commerce
applications.

07-04-2025
Challenges in Chunking

✅ Ambiguity in POS tagging


✅ Nested and overlapping chunks
✅ Language-specific rules and variation
✅ Lack of annotated data for some domains/languages

07-04-2025 *
Dependency Parsing: The Basics
 Dependency Parsing is a process in Natural Language Processing (NLP) that analyzes the
grammatical structure of a sentence by identifying relationships (dependencies) between words.
Instead of building hierarchical phrase structures (like in constituency parsing), it represents a
sentence as a directed graph, where: Each word is a node. Edges (arrows) represent grammatical
relationships between head words and their dependents.

 Example : “The cat sat on the mat.”


 Dependency Tree (simplified)

07-04-2025
Dependency Relations:
sat → root
cat → nsubj (nominal subject of “sat”)
on → prep (preposition modifying “sat”)
mat → pobj (object of the preposition “on”)
the (modifies “cat” and “mat”) → det (determiner)

You might also like