9-Syntax_part1
9-Syntax_part1
Spring Semester
Natural Language
Processing
Dr. Wafaa Samy
Dr. Hanaa Eissa
Syntax (Part 1)
Lecture (9)
2
Contents
• Syntax
• Context-Free Grammar (CFG)
o Derivation
o Parsing
o Grammatical and Ungrammatical Sentences
3
Syntax
• Study the order of words in a sentence and their
relationships.
o Syntax defines word categories and functions.
5
Context-Free Grammars
• Context-Free Grammars (CFGs).
o Also known as:
Phrase structure grammars.
Backus-Naur Form (BNF).
• CFG consist of: Rules, Terminals, and Non-terminals
1. Terminals:
o We’ll take these to be words (for now).
2. Non-Terminals:
o The constituents in a language.
Like noun phrase, verb phrase, verb, noun, sentence, etc.
3. Rules:
o Rules are equations that consist of a single non-terminal on the left
and any number of terminals and non-terminals on the right.
6
Some NP Rules
• Here are some rules for our noun phrases (NP):
8
Formal Definition
9
Context-Free Grammar
• A context-free grammar consists of:
1. A set of production rules, each of which expresses the ways
that symbols of the language can be grouped and ordered
together, and
2. A lexicon of words and symbols.
10
Derivation and Parsing
• A derivation is a sequence of rules applied to a string
that accounts for that string:
o Covers all the elements in the string.
o Covers only the elements in the string.
11
Example (1): Derivation
If A → α is a production rule, and β and γ are any strings in the set (Σ U V)*, then
we say that β A γ directly derives β α γ, written as:
βAγ⇒βαγ
13
Language Defined by CFG
• The formal language defined by a CFG is the set of strings
that are derivable from the designated start symbol.
14
Parsing
• Parsing (or Syntactic parsing) is the process of taking
a string and a grammar and returning a (multiple?)
parse tree(s) for that string.
• It is completely analogous to running a finite-state
transducer with a tape.
o It’s just more powerful:
Remember this means that there are languages we
can capture with CFGs that we can’t capture with
finite-state methods.
15
Example (2): Parsing
• Parse tree:
Parse Tree
17
Question (1)
• Which of the following rules is NOT a correct
context-free rule, where non-terminals are
{A, B}, and terminals are {a, b}.
1. A a
2. A a | bA
3. AB a
4. a bA
5. Both 3 and 4
18
Question (1): Solution
• Which of the following rules is NOT a correct
context-free rule, where non-terminals are
{A, B}, and terminals are {a, b}.
1. A a
2. A a | bA
3. AB a
4. a bA
5. Both 3 and 4
19
Question (2)
• Given {a, the, study} ∈ terminals and
{VP, Verb, Det} ∈ non-terminals, which of the
following production rules is NOT a correct
CFG rule:
1. Det → a | the
2. VP → Verb
3. Verb → study
4. a → Det
20
Question (2): Solution
• Given {a, the, study} ∈ terminals and
{VP, Verb, Det} ∈ non-terminals, which of the
following production rules is NOT a correct
CFG rule:
1. Det → a | the
2. VP → Verb
3. Verb → study
4. a → Det
21
Question (3)
• Given {x, y} is a set of terminals and A is a
non-terminal, which of the following words is
correctly generated from the grammar with a
rule: A → x | yA
1. xyyy
2. yxyxyx
3. yyyx
4. xyy
22
Question (3): Solution
• Given {x, y} is a set of terminals and A is a
non-terminal, which of the following words is
correctly generated from the grammar with a
rule: A → x | yA
1. xyyy
2. yxyxyx
3. yyyx
4. xyy
23
Grammatical and Ungrammatical
Sentences
• A CFG defines a formal language ( a set of strings ).
24
Grammatical and Ungrammatical
Sentences (Cont.)
• Sentences that cannot be derived by a given formal
grammar are not in the language defined by that grammar,
and are referred to as ungrammatical sentences.