NLP Unit-2
NLP Unit-2
into its constituent parts (such as words, phrases, and clauses) and identifying the syntactic
a sentence and is used in tasks like machine translation, question answering, and sentiment
analysis.
Parsing uncovers the hidden structure of linguistic input. In many applications involving
The syntactic analysis of language provides a means to explicitly discover the various
1. Syntactic Parsing:
sentence.
parsing.
2. Semantic Parsing:
the relationships between verbs (predicates) and the entities involved in the actions or
analyses.
that should sound like it was spoken by a native speaker of the language .
1. Constituency Parsing
phrases.
2. Dependency Parsing
Represents the sentence as a directed graph, where nodes are words and edges
Example:
o Dependency Parse:
sat (root)
├── cat (nsubj)
│ └── The (det)
└── on (prep)
└── mat (pobj)
└── the (det)
language. It breaks down sentences into their constituent parts (phrases) and shows how
analyze it:
2. Noun Phrase (NP): "The cat" is a noun phrase, as it functions as the subject.
3. Verb Phrase (VP): "sat on the mat" is a verb phrase, as it contains the verb and
4. Prepositional Phrase (PP): "on the mat" is a prepositional phrase, modifying the
verb "sat."
1. Constituents:
sentence.
o Examples: Noun Phrase (NP), Verb Phrase (VP), Prepositional Phrase (PP).
structures.
o Example:
Verb Phrase).
a Noun).
3. Parse Tree:
o Example: S
├── NP
│ ├── Det (The)
│ └── N (cat)
└── VP
├── V (sat)
└── PP
├── P (on)
└── NP
├── Det (the)
└── N (mat)
4. Terminals and Non-Terminals:
1. S -> NP VP
2. NP -> Det N
3. VP -> V PP
4. PP -> P NP
7. V -> 'sat'
8. P -> 'on'
Another motivation for parsing comes from the natural language task of
summarization, in which several documents about the same topic should be condensed down
Types of Summarization
Example:
a. Original Text: "The cat sat on the mat. It was a sunny day. The cat enjoyed the
warmth."
b. Extractive Summary: "The cat sat on the mat. The cat enjoyed the warmth."
rephrasing the original text, often producing new sentences that capture the
Example:
o Original Text: "The cat sat on the mat. It was a sunny day. The cat
o Abstractive Summary: "The cat relaxed on the mat, enjoying the sunny
weather."
In contemporary natural language processing, syntactic parsers are routinely
3. language summarization
each annotated with its syntactic structure in the form of a parse tree. These parse
trees represent the hierarchical structure of sentences, breaking them down into
constituents like phrases and clauses. Treebanks are essential for data-driven
What is a Treebank?
syntactic structure.
Treebanks are used to train and evaluate parsers and other NLP models.
typically used for linguistic analysis, natural language processing (NLP), and machine
learning. Corpora are essential resources for studying language patterns, training
Parsing recovers information that is not explicit in the input sentence. This
implies that a parser requires some knowledge in addition to the input sentence about
One method to provide such knowledge to the parser is to write down a grammar
of the language--a set of rules of syntactic analysis. For instance, one might write
generated by recursively combining smaller units (e.g., words, phrases) into larger
structures. CFGs are particularly useful for tasks like parsing, syntax analysis,
non-terminals.
o The root non-terminal from which all derivations begin (e.g., S for
sentence).
2. Example of a CFG
Production Rules:
S → NP VP
NP → Det N
VP → V NP | V
Det → the
N → cat | dog
V → chased | slept
Start Symbol: S
Example Derivation:
Derivation:
S → NP VP
→ Det N VP
→ the N VP
→ the cat VP
→ the cat V NP
3. Parse Trees
sentence. Each node in the tree corresponds to a non-terminal symbol, and the leaves
1. Syntactic Parsing:
2. Language Generation:
3. Grammar Checking:
4. Machine Translation:
5. Question Answering:
grammar of transitive verbs in English, verbs (V) that have a subject and object noun
phrase (NP), plus modifiers of verb phrases (VP) in the form of prepositional phrases
(PP).
S -> NP VP
VP -> V NP | VP PP
V -> `bought'
D -> `a'
N -> `shirt'
PP -> P NP
P -> `with'
Natural language grammars typically have the words w as terminal symbols in the
CFG and they are generated by rules of type X ! w where X is the part of speech for
the word w. For example in the above CFG, the rule V -> `saw' has the part of speech
symbol V generating the verb `saw'. Such non-terminals are called part-of-speech tags
or pre-terminals.
The above CFG can produce a syntax analysis of a sentence like
Parsing the sentence with the CFG rules gives us two possible derivations for
this sentence.
In one parse, pockets are a kind of currency which can be used to buy a
shirt, and the other parse, which is the more plausible one, John is purchasing a kind
However, writing down a CFG for the syntactic analysis of natural language is
components in the grammar. We could extend this grammar to include other types of
verbs, and other syntactic constructions, but listing all possible syntactic constructions
instance, listing all the grammar rules in which a particular word can be a participant.
Apart from this knowledge acquisition problem, there is another less apparent
problem: it turns out that the rules interact with each other in combinatorially
explosive ways.
N -> N N
indirectly.
with N as the start symbol, for the input `natural' there is one parse tree (N
natural); for the input `natural language' we use the recursive rule once and obtain one
for the input `natural language processing' we use the recursive rule twice in
Note that the ambiguity in the syntactic analysis reflects a real ambiguity: is it
So this issue cannot be resolved by changing the formalism in which the rules
are written--e.g. by using finite-state automata which can be deterministic but cannot
Any system of writing down syntactic rules should represent this ambiguity.
However, by using the recursive rule 3 times we get 5 parses for `natural language
processing book' and for longer and longer input noun phrases, using the recursive rule
get 132 parses. In fact,for CFGs it can be proved that the number of parses obtained
syntactic rules for a particular language, but we also need to know which analysis is the
where each sentence is provided a complete syntax analysis. The syntactic analysis for
each sentence has been judged by a human expert as the most plausible analysis for
that sentence.
in a treebank.
about syntax is typically used as an annotation guideline to help the human experts
produce the single most plausible syntactic analysis for each sentence in the corpus.
Treebanks solve the first knowledge acquisition problem of finding the grammar
underlying the syntax analysis since the syntactic analysis is directly given instead of
a grammar.
In fact, the parser does not necessarily need any explicit grammar rules as long
as it can faithfully produce a syntax analysis for an input sentence, although the
information used by the trained parser can be said to represent a set of implicit
grammar rules.
Treebanks solve the second knowledge acquisition problem as well. Since each
sentence in a treebank has been given its most plausible syntactic analysis, supervised
machine learning methods can be used to learn a scoring function over all possible
syntax analyses.
A statistical parser trained on the treebank tries to mimic the human annotation
decisions by using indicators from the input and previous decisions made in the parser
itself to learn such a scoring function. For a given sentence not seen in the training
data, a statistical parser can use this scoring function to return the syntax analysis
that has the highest score, which is taken to be the most plausible analysis for that
sentence. The scoring function can also be used to produce the k-best syntax analyses
for a sentence.
There are two main approaches to syntax analysis which are used to construct
representations are very closely related to each other, and under some assumptions
etc., that have free(er) word order, where the arguments of a predicate are often
about long distance dependencies and mostly in languages like English, French, etc.
understanding how words relate to each other and how they contribute to the overall
meaning of the sentence. There are two primary approaches to representing syntactic
structure:
where sentences are broken down into constituents (phrases) such as noun phrases
Hierarchical Structure: Words are grouped into phrases, which are further
Example:
Explanation:
S: Sentence
V: Verb ("sat")
P: Preposition ("on")
2. Dependency-Based Representation (Dependency Graphs)
These graphs are based on the concept of dependency grammar, where words are
a sentence (except the root) is linked to exactly one other word, called its head, and
The main philosophy behind dependency graphs is to connect a word- the head of a
phrase- with the dependents in that phrase. The notation connects a head with its
consistent with many different linguistic frameworks. The words in the input sentence
are treated as the only vertices in the graph, which are linked together by directed
Key Features:
Directed Edges: Show the relationship between a word (dependent) and its head.
Root: The main verb of the sentence, which has no incoming edges.
Dependency labels describe the syntactic role of a word in relation to its head. Some
sat
/ \
det prep
| |
The on
|
pobj
|
mat
|
det
|
the
Explanation:
for an input sentence by identifying the syntactic head of each word in the sentence.
This defines a dependency graph, where the nodes are the words of the input sentence
There are many variants of dependency style syntactic analysis, but the basic
textual format for a dependency tree can be written in the following form, where each
dependent word specifies the head word in the sentence, and exactly one word
a constraint imposed by the linear order of words on the dependencies between words.
A projective dependency tree is one where if we put the words in a linear order based
on the sentence with the root symbol in the first position, the dependency arcs can be
Example
Stanford Parser, NLTK spaCy, Stanza, UDPipe
Tools
between words.
Both approaches are essential for understanding syntactic structure and are
the syntactic structure of sentences. These algorithms break down a sentence into
components like phrases and words and identify their relationships, such as subjects,
which we now assume is the analysis that is consistent with a treebank that is used to
train a parser. Treebank parsers do not need to have an explicit grammar, but to make
the explanation of parsing algorithms simpler we first consider parsing algorithms that
Consider the following simple CFG that can be used to derive strings such as a and b or
N -> N `and' N
N -> N `or' N
An important concept for parsing is a derivation. For the input string a and b or c the
=> N `or' N
In this derivation each line is called a sentential form. Furthermore, each line
of the derivation applies a rule from the CFG in order to show that the input can, in
non-terminal in each sentential form. This method is called the rightmost derivation
`a and b or c'
=> N `and b or c' # use rule N -> a
(N (N (N a)
and
(N b))
or
(N c))
However, a unique derivation sequence is not guaranteed. There can be many different
derivations, and as we have seen before the number of derivations can be exponential
in the input length. For example, there is another rightmost derivation that results in
(N (N a)
and
(N (N b)
or
(N c)))
`a and b or c'
1. Top-Down Parsing
2. Bottom-Up Parsing
3. Chart Parsing
4. Dependency Parsing
5. Constituency Parsing
6. Transition-Based Parsing
relatively simple and efficient algorithm designed to construct parse trees for sentences
Shift-reduce parsing is a popular bottom-up technique used in syntax analysis, where the
goal is to create a parse tree for a given input based on grammar rules. The process works by
reading a stream of tokens (the input), and then working backwards through the grammar
1. Input Buffer: This stores the string or sequence of tokens that needs to be parsed.
2. Stack: The parser uses a stack to keep track of which symbols or parts of the parse
it has already processed. As it processes the input, symbols are pushed onto and
3. Parsing Table: Similar to a predictive parser, a parsing table helps the parser decide
Shift-reduce parsing works by processing the input left to right and gradually building up a
parse tree by shifting tokens onto the stack and reducing them using grammar rules, until it
1. Shift Operation:
o In the shift operation, the next input word is moved onto the stack. This means
that the parser "shifts" the word from the input sequence and places it in a
2. Reduce Operation:
o The reduce operation is applied when a specific set of conditions are met,
typically when the stack has a substructure that can be reduced into a higher-
a rule or grammar (e.g., combining a noun and a determiner into a noun phrase).
3. Stack:
o A stack is where words and partial constituents are held while the parser works
4. Input Buffer:
o The input buffer contains the remaining words of the sentence that need to be
processed.
5. Goal:
o The goal of a shift-reduce parser is to reduce the input buffer into a single tree
Parsing Process:
1. Shift: A word is moved from the input buffer onto the stack.
2. Reduce: A combination of words (or previously reduced structures) from the stack is
S –> S + S
S –> S * S
S –> id
o Stack: [The]
o Input Buffer: []
o Input Buffer: []
Now, the stack contains the complete parse tree of the sentence, represented as a "Sentence"
Every CFG turns out to have an automaton that is equivalent to it, called a pushdown
pushdown automaton is simply a finite-state automaton with some additional memory in the
form of a stack (or pushdown). This is a limited amount of memory since only the top of the
This provides an algorithm for parsing that is general for any given CFG and input string.
The algorithm is called shift-reduce parsing which uses two data-structures: a bu_er for input
symbols and a stack for storing CFG symbols and is defined as follows:
1. Start with an empty stack and the buffer contains the input string.
2. Exit with success if the top of the stack contains the start symbol of the grammar and if
3. Choose between the following two steps (if the choice is ambiguous, choose one based on an
oracle):
_ Shift a symbol from the buffer onto the stack.
_ If the top k symbols of the stack are _1 : : : _k which corresponds to the right-hand
side of a CFG rule A ! _1 : : : _k then replace the top k symbols with the left-hand side
non-terminal A.
5. Else, go to Step 2.
At each step the parser has a choice: either shift a new token into the stack, or combine
the top two elements of the stack with a head -> dependent link or a dependent <- head link.
When using the shift-reduce algorithm in a statistical dependency parser it helps to combine
1. Hypergraphs
any number of vertices. In NLP, hypergraphs are used to represent complex structures like
parse trees, dependency graphs, or semantic relations, where traditional graphs (with pairwise
2. Chart Parsing
Chart parsing is a dynamic programming technique used in NLP to efficiently parse sentences
Example: Parsing the sentence "The cat sat on the mat" using a context-free grammar
(CFG). The chart stores partial parses (e.g., "The cat" as an NP, "sat on the mat" as a
VP).
Hypergraphs and chart parsing are two related concepts used in natural language processing
(NLP) for syntactic parsing. Hypergraphs represent a generalization of traditional parse trees,
allowing for more complex structures and more efficient parsing algorithms. A hypergraph
consists of a set of nodes (representing words or phrases in the input sentence) and a set of
hyperedges, which connect nodes and represent higher-level structures. A chart, on the other
hand, is a data structure used in chart parsing to efficiently store and manipulate all possible
Here is an example of how chart parsing can be used to parse the sentence "the cat chased
S -> NP VP
NP -> Det N
VP -> V NP
V -> chased
1. Initialization: We start by initializing an empty chart with the length of the input
sentence (5 words) and a set of empty cells representing all possible partial parses.
2. Scanning: We scan each word in the input sentence and add a corresponding parse
to the chart. For example, for the first word "the", we add a parse for the non-terminal
symbol Det (Det -> the). We do this for each word in the sentence.
3. Predicting: We use the grammar rules to predict possible partial parses for each
span of words in the sentence. For example, we can predict a partial parse for the span
(1, 2) (i.e., the first two words "the cat") by applying the rule
NP -> Det N to the parses for "the" and "cat". We add this partial parse to the
4. Scanning again: We scan the input sentence again, this time looking for matches to
predicted partial parses in the chart.
For example, if we predicted a partial parse for the span (1, 2), we look for a parse
for the exact same span in the chart. If we find a match, we can apply a grammar rule
to combine the two partial parses into a larger parse. For example, if we find a parse
for (1, 2) that matches the predicted parse for NP -> Det N, we can combine them to
create a parse for the span (1, 3) and the non-terminal symbol NP.
6. Output: The final parse tree for the sentence is represented by the complete
parse in the chart cell for the span (1, 5) and the non-terminal symbol S.
Hypergraphs can be used to represent the chart in chart parsing. Each hyperedge
o Vertices: Positions between words (0: start, 1: after "The", 2: after "cat",
etc.).
Consider the sentence "The cat chased the mouse" and a simple CFG:
S → NP VP
NP → Det N
VP → V NP
Det → "The"
N → "cat" | "mouse"
V → "chased"
Hypergraph Representation:
Hyperedges:
o N → "cat" (1-2)
o NP → Det N (0-2)
o V → "chased" (2-3)
o N → "mouse" (4-5)
o NP → Det N (3-5)
o VP → V NP (2-5)
o S → NP VP (0-5)
The hypergraph compactly encodes all possible parses, and chart parsing algorithms can
Shift-reduce parsing allows a linear time parser but requires access to an oracle.
CFGs in the worst case need backtracking and have a worst case parsing algorithm
Variants of this algorithm are used in statistical parsers that attempt to search the
Our example CFG G is rewritten as new CFG Gcwhich contains up to two non-terminals
on the right hand side.
We can specialize the CFG Gcto a particular input string by creating a new CFG that
represents all possible parse trees that are valid in grammar Gcfor this particular
input sentence.
•For the input “a and b or c” the new CFG Gfthat represents the forest of parse
Here a parsing algorithm is defined as taking as input a CFG and an input string and
producing a specialized CFG that represents all legal parsers for the input.
A parser has to create all the valid specialized rules from the start symbol
nonterminal that spans the entire string to the leaf nodes that are the input tokens.
Now let us look at the steps the parser has to take to construct a specialized CFG.
Chart parsing can be more efficient than other parsing algorithms, such as
recursive descent or shift-reduce parsing, because it stores all possible partial parses
in the chart and avoids redundant parsing of the same span multiple times. Hypergraphs
can also be used in chart parsing to represent more complex structures and enable
Minimum spanning tree (MST) algorithms are often used for dependency parsing,
as they provide an efficient way to find the most likely parse for a sentence given a set
of syntactic dependencies.
Here's an example of how a MST algorithm can be used for dependency parsing:
Consider the sentence "The cat chased the mouse". We can represent this sentence as
a graph with nodes for each word and edges representing the syntactic dependencies
between them:
We can use a MST algorithm to find the most likely parse for this graph. One popular
algorithm for this is the Chu-Liu/Edmonds algorithm:
1. We first remove all self-loops and multiple edges in the graph. This is because a valid
dependency tree must be acyclic and have only one edge between any two nodes.
2. We then choose a node to be the root of the tree. In this example, we can choose
"chased" to be the root since it is the main verb of the sentence.
3. We then compute the scores for each edge in the graph based on a scoring function
that takes into account the probability of each edge being a valid dependency. The
score function can be based on various linguistic features, such as part-of-speech tags
or word embeddings.
2.5. Models for Ambiguity Resolution in parsing
Ambiguity resolution in parsing is a crucial aspect of Natural Language
Processing (NLP). Parsing involves analyzing the syntactic structure of a sentence, and
ambiguity arises when a sentence can be parsed in multiple ways, leading to different
interpretations. Here are some common models and techniques used for ambiguity
resolution in parsing.
Example:
o Ambiguity: Does "with the telescope" modify "saw" (I used the telescope
to see the man) or "the man" (the man has the telescope)?
o Resolution: PCFG assigns probabilities to each parse tree and selects the
most likely one based on training data.
S → NP VP
NP → Det N
VP → V NP
Det → "the"
N → "dog"
V → "chased"
2. Probabilities in PCFG:
o The probabilities of all rules with the same left-hand side (LHS) must sum
to 1.
o Example:
S → NP VP [0.9]
S → VP [0.1]
NP → Det N [0.6]
NP → N [0.4]
o The parser selects the parse tree with the highest probability.
S → NP VP [0.9]
NP → Det N [0.6]
NP → N [0.4]
VP → V NP [0.7]
VP → V [0.3]
Det → "the" [1.0]
N → "dog" [0.5]
N → "cat" [0.5]
V → "chased" [1.0]
Step 2: Generate Parse Trees
1. Parse Tree 1:
S
├── NP
│ ├── Det: "the"
│ └── N: "dog"
└── VP
├── V: "chased"
└── NP
├── Det: "the"
└── N: "cat"
2. Parse Tree 2:
S
├── NP
│ └── N: "dog"
└── VP
├── V: "chased"
└── NP
├── Det: "the"
└── N: "cat"
Parse Tree 1:
o Probability: 0.9×0.6×0.7×1.0×0.5×1.0×1.0×0.5=0.0945
Parse Tree 2:
o Probability: 0.9×0.4×0.7×0.5×1.0×1.0×0.5=0.063
Parse Tree 1 has a higher probability (0.0945) than Parse Tree 2 (0.063), so it
is selected as the correct parse.
Ambiguity Resolution with PCFG
Ambiguity: Does "with the telescope" modify "saw" (I used the telescope to see
the man) or "the man" (the man has the telescope)?
PCFG Resolution:
o If the training data shows that "with the telescope" is more likely to
modify "saw," the parser selects the first parse tree.
Generative models in NLP are probabilistic models that generate sentences or parse
trees by modeling the joint probability distribution of the input (e.g., words) and the
output (e.g., syntactic structure). These models are widely used in parsing to predict
the most likely syntactic structure of a sentence. Below is an explanation of generative
models for parsing, along with examples.
o Generative Models: Model the joint probability P(X,Y), where Xis the
input (words) and Y is the output (parse tree). They generate data by
sampling from this distribution.
o Given a sentence, the model generates the most likely parse tree by
estimating P(Y∣X) using Bayes' rule:
P(Y∣X)=P(X∣Y)⋅P(Y)/P(X)
o Syntax Model (P(Y): Models the probability of a parse tree (e.g., using
PCFG).
o Lexical Model (P(X∣Y)): Models the probability of words given the parse
tree.
S → NP VP [0.9]
NP → Det N [0.6]
NP → N [0.4]
VP → V NP [0.7]
VP → V [0.3]
N → "dog" [0.5]
N → "cat" [0.5]
V → "chased" [1.0]
The parser generates all possible parse trees for the sentence using the
PCFG rules.
For each parse tree, the model calculates the probability as the product
of the probabilities of the rules used.
The parse tree with the highest probability is selected as the output.
carries meaning. Tokenization is the process of splitting a text into individual tokens,
which are typically words, symbols, or subwords. Tokens are the building blocks for
most NLP tasks, such as text analysis, machine translation, and sentiment analysis.
treated as two independent tokens, today and 's, or There and 's.
What is a Token?
o Languages like Chinese, Japanese, and Thai do not use spaces to separate
2. Compound Words:
o Languages like German and Finnish form compound words, which need to
o Languages like Arabic and Turkish have rich morphology, where a single
4. Script Variations:
tokenization rules.
5. Ambiguity:
In syntax parsing, tokenization, case handling, and encoding are critical preprocessing
steps that significantly impact the performance of NLP models. LetÕs break down each
1. Tokenization
Tokenization is the process of splitting text into individual tokens (words, symbols, or
subwords). It is the first step in syntax parsing and directly affects how the parser
Incorrect tokenization can lead to incorrect parsing (e.g., splitting "can't" into
Challenges:
Multilingual Tokenization: Languages like Chinese and Japanese donÕt use spaces,
Compound Words: Languages like German form long compound words that need
tokens = word_tokenize(text)
print(tokens)
Output:
2. Case Handling
Case refers to the distinction between uppercase and lowercase letters. Proper
handling of case is crucial for syntax parsing, as it can affect the interpretation of
words.
Proper Nouns: Uppercase letters often indicate proper nouns (e.g., "John",
"Paris").
Ambiguity: Case can change the meaning of words (e.g., "apple" vs. "Apple").
Challenges:
Case Insensitivity: Some languages (e.g., German) capitalize all nouns, which can
lead to ambiguity.
Lowercasing: Lowercasing text can lose important information (e.g., "US" vs.
"us").
Example:
tokens = text.split()
print(lowercase_tokens)
Output:
machines. In syntax parsing, text must be encoded into numerical vectors or other
multilingual text).
Word Embeddings: Words are often encoded as dense vectors (e.g., Word2Vec,
GloVe, BERT).
Challenges:
HereÕs how tokenization, case handling, and encoding fit into a typical syntax parsing
pipeline:
Key Points
1. Tokenization:
2. Case Handling:
3. Encoding:
4. Syntax Parsing:
2. Case Handling: Some languages (e.g., German) capitalize all nouns, which can lead
to ambiguity.
Word Segmentation
Word Segmentation is a critical step in syntax parsing, especially for languages that
do not use spaces to separate words (e.g., Chinese, Japanese, Thai). It involves splitting
segmentation is essential for downstream NLP tasks like parsing, machine translation,
Word segmentation is the process of identifying word boundaries in text. For languages
like English, this is relatively straightforward because words are separated by spaces.
However, for languages like Chinese, word segmentation is more complex because there
1. Input for Parsers: Syntax parsers rely on correctly segmented words to build
parse trees.
relationships.
1. Ambiguity:
3. Language-Specific Rules:
1. Rule-Based Methods:
2. Statistical Methods:
3. Neural Methods:
4. Hybrid Methods:
accuracy.
Morphology
Morphology plays a crucial role in syntax parsing in Natural Language Processing (NLP).
Morphology deals with the structure and formation of words, including how words are
sentence.
since each wordP can contain several components, called morphemes, such that the
of the morphemes. A word must now be thought of as being decomposed into a stem
What is Morphology?
Morphology is the study of the internal structure of words and how they are formed.
It involves:
singular).
"Running is fun").
2. Dependency Parsing:
3. Constituency Parsing:
o Example: In "The quick brown fox," the adjectives "quick" and "brown"
4. Handling Ambiguity:
singular of "leave").
1. Morphological Richness:
2. Agglutination:
root word.
o Example: Turkish "evlerimizde" (in our houses) → ["ev", "ler", "imiz", "de"].
3. Irregular Forms:
4. Ambiguity:
singular of "fly").