Natural Language Processing
Course code: CSE3015
Module 3
Parsing Structure in Text
Prepared by
Dr. Venkata Rami Reddy Ch
SCOPE
Syllabus
• Shallow vs Deep parsing
• Approaches in parsing,
• Types of parsing-
• Regex parser,
• Dependency parser,
• Constituency Parsing
• Meaning Representation:
• Logical Semantics,
• Semantic Role Labelling,
• Distributional Semantics
• Discourse Processing: Anaphora and Coreference Resolution
Parsing in NLP
• Parsing is the process of analyzing the grammatical structure of a sentence to determine its
syntactic or semantic meaning.
• It is a crucial component of NLP and helps machines understand human language.
The main purposes of parsing in NLP include:
[Link] Sentence Structure – It helps in breaking down a sentence into its
grammatical components, such as nouns, verbs, subjects, and objects.
[Link] Analysis – It determines the relationships between words in a sentence, which
is useful for applications like machine translation, question answering, and Chatbots
[Link] Analysis – Parsing provides a foundation for understanding meaning by identifying
roles like subjects, predicates, and modifiers.
[Link] Translation – Syntax-based translation models rely on parsing to preserve the
grammatical structure in translated languages.
[Link] Answering and Chatbots – Understanding the structure of user queries helps
generate more relevant and meaningful responses.
Parsing in NLP
• The word syntax originates from the Greek word syntaxis, meaning “arrangement”, and
refers to how the words are arranged together.
• Sentence = S = Noun Phrase + Verb Phrase + Preposition Phrase
S = NP + VP + PP
• The different word groups that exist according to English grammar rules are:–
Noun Phrase(NP): Determiner + Nominal Nouns = DET + Nominal
Verb Phrase (VP): Verb + range of combinations
Prepositional Phrase (PP): Preposition + Noun Phrase = P + NP
Shallow Parsing (Chunking) in NLP
• Shallow parsing, also known as chunking, is used to extract phrases or chunks from
sentences without analyzing their deeper syntactic structure.
• It identifies syntactic constituents (chunks) such as noun phrases (NPs), verb phrases (VPs),
and prepositional phrases (PPs).
Steps in Shallow Parsing
[Link] – Splitting text into words.
[Link] Tagging – Assigning part-of-speech (POS) tags to words.
[Link] – Grouping words into meaningful phrases (noun phrases, verb phrases, etc.)
Example of Shallow Parsing
Sentence
• "The quick brown fox jumps over the lazy dog."
Deep Parsing in NLP
• Deep parsing, also known as full parsing or syntactic parsing, is the process of analyzing the
full syntactic structure of a sentence, typically producing a parse tree or dependency graph.
• Unlike shallow parsing , which only identifies phrases like noun phrases (NPs) and verb
phrases (VPs), deep parsing provides a detailed hierarchical structure of the sentence,
including grammatical relations.
• Key Aspects of Deep Parsing
Full Sentence Structure
1. Identifies subject, object, verb, modifiers, etc.
2. Determines phrase structure (e.g., noun phrases, verb phrases).
3. Builds a parse tree (constituency tree) or dependency graph.
Deep Parsing in NLP
• Parser is used to implement the task of parsing.
• It may be defined as the software component designed for taking input data (text) and
giving structural representation of the input after checking for correct syntax as per formal
grammar.
• It builds a data structure generally in the form of parse tree or syntax tree or other
hierarchical structure.
• As the parser splits the sequence of text into a bunch of words that are related in a sort of
phrase.
Input text Parser Valid parse tree
Set of grammar
rules(productions)
Example
parse tree
Input : Tom ate an apple
Shallow VS Deep Parsing
Shallow Parsing Deep Parsing
This technique generates only phrases This parsing technique generates the
of the syntactic structure of a sentence. complete syntactic structure of the
sentence.
It can be used for less complex NLP It is suitable for complex NLP
applications applications.
It is also called chunking. It is also called full parsing.
Applications: NER, chunking, POS Applications: Syntax analysis, translation,
tagging QA
Context-Free Grammar (CFG) in Parsing
• A context-free grammar (CFG) is a set of production rules used to generate all the possible sentences in a given language
• CFG is widely used for syntactic parsing, where a parser constructs a parse tree to analyze the structure of a sentence
based on the given grammar.
• These rules specify how individual words in a sentence can be grouped to form constituents such as noun phrases, verb
phrases, preposition phrases, etc
Key Components of CFG
• A CFG is defined as a 4-tuple:
G=(N,Σ,P,S)
where:
• N (Non-terminals): A set of non-terminal symbols (e.g., S, NP, VP).
• Σ (Terminals): A set of terminal symbols (actual words or tokens in a language).
• P (Production Rules): Rules that define how non-terminals can be replaced by other non-terminals or terminals.
• S (Start Symbol): A special non-terminal from which parsing starts.
Context-Free Grammar (CFG) in Parsing
CFG: • A parse tree (or syntax tree) in parsing plays a crucial role in NLP by visually
• S → NP VP representing the syntactic structure of a sentence based on a given grammar.
• NP → Det N • It is essential for understanding sentence structure and meaning.
• VP → V NP
• Det → "the" | "a" parse tree
• N → "dog" | "cat"
• V → "chases" | "sees“
Derivation:
• S → NP VP
• S → Det N VP
• S → the dog VP
• S → the dog V NP
• S → the dog chases NP
• S → the dog chases Det N
• S → the dog chases a N
• S → the dog chases a cat
Approaches in parsing
• Top-Down Parsing
• Bottom-up parsing
Top-Down parsing
• Top-down parsing starts from the root (start symbol) of the grammar and tries to derive the
input sentence by applying production rules.
• It recursively expands the non-terminals until the input is matched or parsing fails.
• Top-down, left-to-right and backtracking are prominent search strategies are used in this
approach.
Steps of Top-Down Parsing
[Link] with the start symbol
1. Begin with the root node (usually S in a context-free grammar).
[Link] using grammar rules
1. Apply production rules to expand non-terminals in a depth-first manner.
[Link] the input sentence
1. Compare generated symbols with input tokens.
2. If a match is found, continue expanding the next non-terminal.
[Link] (if necessary)
1. If a production does not match with input tokens, backtrack and try a different production.
[Link] until the input is fully parsed
1. Continue until the entire input sentence is matched with a valid parse tree.
Top-Down parsing
Bottom-up parsing
• Bottom-up parsing starts with the input words (tokens) and applies grammar rules in reverse
to construct the parse tree, moving from the leaves to the root.
• The goal of reaching the starting symbol S is accomplished through a series of reductions;
when the right-hand side of some rule matches the substring of the input string, the
substring is replaced with the left-hand side of the matched production, and the process is
repeated until the starting symbol is reached.
How It Works:
[Link] with the input sentence (sequence of words).
[Link] phrases (subtrees) by matching grammar rules.
[Link] merging phrases until reaching the root (start symbol of the grammar).
Sentence: John is playing a game
Sentence: John ate the cake
Sentence: John ate the
cake
Cons of Top-Down Parsing
[Link] Overhead – Can suffer from excessive backtracking in ambiguous grammars,
making parsing inefficient.
[Link] Context Sensitivity – Struggles with complex languages where parsing depends on
deeper contextual information.
Cons of Bottom-Up Parsing
[Link] Implementation – More difficult to implement compared to top-down
approaches.
[Link] Memory Usage – Stores intermediate parse states, leading to increased memory
consumption.
[Link] for Simple Grammars – For straightforward languages, top-down parsers (like
recursive descent) may be faster.
[Link] Readable Parse Trees – The parse tree is built in a non-intuitive order (from leaves to
root), making debugging harder.
Types of parsing/parsers
• Regex parser,
• Dependency parser,
• Constituency Parsing
Regexp Parser
• A Regex Parser in NLP is a rule-based method for chunking and parsing text using regular
expressions applied to Part-of-Speech (POS)-tagged words.
• Regexp Parser uses regular expressions defined in the form of grammar and applied on
POS-tagged string to generate a parse tree
Steps of Regex Parser
[Link] Sentence
2. Tokenization → Split sentence into words
[Link] Tagging → Assign part-of-speech labels
[Link] Regex Parser → Identify phrase structures
[Link] Key Phrases → Find NP, VP, PP, etc.
6. Display Parse Tree → Draw a visual representation
Regexp Parser
Define the Grammar Rules
Create regex-based rules to identify phrases like Noun Phrases (NP), Verb Phrases (VP), and
Prepositional Phrases (PP)
grammar = r"""
NP: {<DT>? <JJ>* <NN>*} # Noun Phrase (Determiner + Adjective(s) + Noun(s))
P: {<IN>} # Preposition (e.g., in, on, at)
V: {<V.*>} # Verb (any verb form)
PP: {<P> <NP>} # Prepositional Phrase (P + NP)
VP: {<V> <NP|PP>*} # Verb Phrase (V + NP/PP) ""“
Create a Regex Parser
reg_parser = RegexpParser(grammar)
Tokenize and POS Tag a Sentence
sentence = "The quick brown fox jumps over the lazy dog“
pos_tags = pos_tag(word_tokenize(sentence) )
Parse the Sentence Using the Regex Parser
parsed_sentence = reg_parser.parse(pos_tags)
Regexp Parser
import nltk
[Link]('punkt')
[Link]('averaged_perceptron_tagger')
from nltk import pos_tag, word_tokenize, RegexpParser
text = "The quick brown fox jumps over the lazy dog"
tokens = word_tokenize(text)
tags = pos_tag(tokens)
reg_parser = RegexpParser("""
NP: {<DT>?<JJ>*<NN>} # To extract Noun Phrases
P: {<IN>} # To extract Prepositions
V: {<V.*>} # To extract Verbs
PP: {<IN><NP>} # To extract Prepositional Phrases
VP: {<V.*><NP|PP>*} # To extract Verb Phrases
""")
result = reg_parser .parse(tags)
print('Parse Tree:', result)
[Link]()
Constituency
• Parsing
Constituency Parsing in NLP is a syntactic analysis technique that breaks down a sentence
into its constituent components (phrases) based on a context-free grammar (CFG).
• The output of a constituency parser is typically a parse tree, which represents the
hierarchical structure of the sentence
• It creates a parse tree that represents the syntactic structure of a sentence according to
grammar rules.
• The process involves identifying noun phrases, verb phrases, and other constituents, and
then determining the relationships between them.
• It helps in understanding the grammatical structure of sentences, which is crucial for
various NLP tasks such as text summarization, machine translation, question answering, and
text classification.
Constituency
Parsing
Steps in Constituency Parsing:
[Link] a Context-Free Grammar (CFG)
2. Constructing a Parse Tree
• A parse tree can be constructed from the CFG using either a Top-Down or Bottom-Up
parsing approach.
Example:
CFG:
S → NP VP
NP → Det N
VP → V NP
Det → "the" | "a"
N → "dog" | "cat"
V → "chases" | "sees“
Constituency
import nltk
from nltk import CFG
Parsing The ChartParser is a bottom-up dynamic programming
# Define a Context-Free Grammar (CFG)
parsing algorithm that efficiently constructs a parse tree by
grammar = [Link]("""
storing intermediate results in a chart table.
S -> NP VP
NP -> DT NN | DT NN
VP -> VBD PP Output:
PP -> IN NP
DT -> 'The' | 'the'
NN -> 'cat' | 'mat'
VBD -> 'sat'
IN -> 'on'
""")
# Create a ChartParser(Botton-up parser)
parser = [Link](grammar)
sentence = "The cat sat on the mat".split()
# Generate parse tree
for tree in [Link](sentence):
print(tree)
tree.pretty_print()
Application of
1.
Constituency
Machine Translation
Parsing
2. Information Retrieval
3. Question Answering
4. Text Summarization
5. Sentiment analysis
6. Grammar Checking
Dependency Parsing
• Dependency parsing, on the other hand, focuses on identifying the grammatical
relationships between words in a sentence.
• The output is a dependency tree or graph that shows how words are related to each other
• It involves constructing a tree-like structure of dependencies, where
each word is represented as a node and the relationships between words
are represented as edges.
• Dependency Parsing is a powerful technique for understanding the
meaning and structure of language, and is used in a variety of
applications, including text classification, sentiment analysis, and
machine translation.
Key Concepts of Dependency Parsing
1. Dependency Relations
•A sentence is represented as a directed graph where words (nodes) are connected by
dependency relations (edges).
•Each word (except the root) depends on a head (governor), forming a hierarchical tree.
2. Head and Dependent
•Head: The main word in a phrase (e.g., a verb in a sentence).
•Dependent: A word that modifies or depends on the head.
3. Root
•The central word of the sentence, usually the main verb, to which all other words are
directly or indirectly connected.
4. Dependency Types (Labels)
•Common relations include:
• nsubj (nominal subject) – The subject of a verb.
• dobj (direct object) – The object receiving the action.
• amod (adjectival modifier) – An adjective modifying a noun.
• prep (prepositional modifier) – A preposition connecting words.