Context Free Grammars
Introduction
•Finite Automata accept all regular languages and only
regular languages
•Many simple languages are non regular:
- {anbn : n = 0, 1, 2, …}
- {w : w a is palindrome}
and there is no finite automata that accepts them.
• context-free languages are a larger class of languages that
encompasses all regular languages and many others, including
the two above.
Context-Free Grammars
• Languages that are generated by context-free grammars
are context-free languages
• Context-free grammars are more expressive than finite
automata: if a language L is accepted by a finite automata
then L can be generated by a context-free grammar
• Beware: The converse is NOT true
Context-Free Grammar
Definition. A context-free grammar is a 4-tuple (, NT, R, S),
where:
• is an alphabet (each character in is called terminal)
• NT is a set (each element in NT is called nonterminal)
• R, the set of rules, is a subset of NT ( NT)*
If (,) R, we write production
is called a sentential form
• S, the start symbol, is one of the symbols in NT
CFGs: Alternate Definition
many textbooks use different symbols and terms to describe CFG’s
G = (V, , P, S)
V = variables a finite set
= alphabet or terminals a finite set
P = productions a finite set
S = start variable SV
Productions’ form, where AV, (V)*:
A
Derivations
Definition. v is one-step derivable from u, written u v, if:
• u = xz
• v = xz
• in R
Definition. v is derivable from u, written u * v, if:
There is a chain of one-derivations of the form:
u u1 u2 … v
Context-Free Languages
Definition. Given a context-free grammar
G = (, NT, R, S), the language generated or
derived from G is the set:
L(G) = {w : S * w }
Definition. A language L is context-free if there is a
context-free grammar G = (, NT, R, S), such that L is
generated from G
Parse Tree
A parse tree of a derivation is a tree in which:
• Each internal node is labeled with a nonterminal
•If a rule A A1A2…An occurs in the derivation then A is
a parent node of nodes
S labeled A1, A2, …, An
a S
a S
b
S
e
Parse Trees
S A|AB Sample derivations:
A |a|Ab|AA S AB AAB aAB aaB aabB aabb
B b|bc|Bc|bB S AB AbB Abb AAbb Aabb aabb
These two derivations use same productions, but in different orders.
This ordering difference is often uninteresting.
Derivation trees give way to abstract away ordering differences.
S Root label = start node.
A B Each interior label = variable.
Each parent/child relation = derivation step.
A A b B
Each leaf label = terminal or .
a a b
All leaf labels together = derived string = yield.
Leftmost, Rightmost Derivations
Definition. A left-most derivation of a sentential form is one
in which rules transforming the left-most nonterminal are
always applied
Definition. A right-most derivation of a sentential form is
one in which rules transforming the right-most nonterminal
are always applied
Leftmost & Rightmost Derivations
S A|AB Sample derivations:
A |a|Ab|AA S AB AAB aAB aaB aabB aabb
B b|bc|Bc|bB S AB AbB Abb AAbb Aabb aabb
S These two derivations are special.
A B 1st derivation is leftmost.
Always picks leftmost variable.
A A b B
2nd derivation is rightmost.
a a b Always picks rightmost variable.
Left / Rightmost Derivations
In proofs…
Restrict attention to left- or rightmost derivations.
In parsing algorithms…
Restrict attention to left- or rightmost derivations.
E.g., recursive descent uses leftmost; yacc uses rightmost.
Derivation Trees
S A|AB
Other derivation
A |a|Ab|AA w = aabb
trees for this string?
B b|bc|Bc|bB
S S
S ? ?
A
A B A B
A A Infinitely
A A b B A A b many others
A A A b possible.
a a b a A b
a A b
a
a
Ambiguous Grammar
Definition. A grammar G is ambiguous if there is a word
w L(G) having are least two different parse trees
SA
SB
S AB
A aA
B bB
Ae
Be
Notice that a has at least two left-most derivations
Ambiguity
CFG ambiguous any of following equivalent
statements:
string w with multiple derivation trees.
string w with multiple leftmost derivations.
string w with multiple rightmost derivations.
Defining ambiguity of grammar, not language.
Ambiguity & Disambiguation
Given an ambiguous grammar, would like an
equivalent unambiguous grammar.
Allows you to know more about structure of a given
derivation.
Simplifies inductive proofs on derivations.
Can lead to more efficient parsing algorithms.
In programming languages, want to impose a
canonical structure on derivations. E.g., for 1+23.
Strategy: Force an ordering on all derivations.
Disambiguation: Example 1
Exp n Exp Term
| Exp + Exp | Term + Exp
Term n
| Exp Exp
| n Term
?
What is an equivalent
? Uses
unambiguous operator precedence
grammar? left-associativity