Top Down Parser
2
The parse tree is constructed
– From the top
– From left to right
• Terminals are seen in order of
appearance in the token stream:
t2 t5 t6 t8 t9
3
Top-down parser
Recursive-Descent Parsing
Backtracking is needed (If a choice of a production rule
does not work, we backtrack to try other alternatives.)
It is a general parsing technique, but not widely used.
Not efficient
Predictive Parsing
no backtracking
efficient
needs a special form of grammars (LL(1) grammars).
Non-Recursive (Table Driven) Predictive Parser is also
known as LL(1) parser.
Recursive Predictive Parsing is a special form of Recursive
Descent parsing without backtracking.
4
Backtracking is needed.
It tries to find the left-most derivation.
S aBc
B bc | b
S S
Input : abc fails, backtrack
a B c a B c
b c b
5
Consider the following production
S → aAb
A → c |cd
Let the input string be acdb.
6
Consider the following production
SBA| AB
Aa| SA
Bb | SB
w= abab
Parse the above w using recursive decent parsing and
find the problem of recursive decent parser
7
When re-writing a non-terminal in a derivation step, a
predictive parser can uniquely choose a production rule
by just looking the current symbol in the input string.
A 1 | ... | n input: ... a .......
current token
Unlike recursive-descent, predictive parser can “predict”
which production to use.
– By looking at the next few tokens.
– No backtracking.
8
stmt if ...... |
while ...... |
begin ...... |
for .....
When we are trying to write the non-terminal stmt, if the current
token is if we have to choose first production rule.
When we are trying to write the non-terminal stmt, we can
uniquely choose the production rule by just looking the current
token.
9
A → BC
B → DE
D → FG
F → HI
H → xY
First(A) = {x}
Write the sets of the following:
S -> Ty
T -> AB
T -> sT
A -> aA
A -> λ
B -> bB
B -> λ
Non-Recursive predictive parsing is a table-driven parser.
It is a top-down parser.
It is also known as LL(1) Parser.
input buffer
stack Non-recursive output
Predictive Parser
Parsing Table
72
S→Bc|DB
B→ab|cS
D→d|ε
For this grammar:
Construct FIRST and FOLLOW Sets
Apply algorithm to calculate parse table
X FIRST(X) FOLLOW(X)
---------------------------------------------------
D { d, ε } { a, c }
B { a, c } { c, $ }
S { a, c, d } { $, c }
Bc { a, c }
DB { d, a, c }
ab {a}
cS {c}
D {d}
Ε {ε }
a b c d $
S Bc Bc DB
DB DB
B
D ε ε
Finish Filling In Table
input buffer
our string to be parsed. We will assume that its end is marked
with a special symbol $.
stack
contains the grammar symbols
at the bottom of the stack, there is a special end marker symbol $.
initially the stack contains only the symbol $ and the starting
symbol S. $S initial stack
when the stack is emptied (i.e. only $ left in the stack), the
parsing is completed.
81
output
a production rule representing a step of the
derivation sequence (left-most derivation) of the
string in the input buffer.
parsing table
a two-dimensional array M[A,a]
each row is a non-terminal symbol
each column is a terminal symbol & the special symbol $
each entry holds a production rule.
82
The symbol at the top of the stack (say X) and the
current symbol in the input string (say a) determine the
parser action.
There are four possible parser actions.
1. If X and a are $ parser halts (successful completion)
2. If X and a are the same terminal symbol then
parser pops X from the stack, and moves the next symbol in the input
buffer.
3. If X is a non-terminal
M [X,a] holds a production rule XY1Y2...Yk, it pushes Yk,Yk-1,...,Y1 into
the stack. The parser also outputs the production rule XY1Y2...Yk to
represent a step of the derivation.
4. none of the above error
all empty entries in the parsing table are errors.
If X is a terminal symbol different from a, this is also an error case. 83
stack input output
$E id+id$ E TE’ id + $
$E’T id+id$ E’
T FT E
$E’ T’F id+id$ F id TE’
$ E’ T’id id+id$
$ E ’ T’ +id$
E
T’
’
E’ E’
$ E’ +id$ E’ +TE’ +TE’
$ E’ T+ +id$ T T
$ E’ T id$ T FT’ FT’
$ E ’ T’ F id$ F id ’
T T’ T’
$ E’ T’id id$
$ E ’ T’ $ T’
$ E’ $
F
E’
F
$ $ accept id
141
a b $
S aBa LL(1) Parsing
B bB | S S aBa Table
w =abba
B B B bB
stack input output
$S abba$ S aBa
$aBa abba$
$aB bba$ B bB
$aBb bba$
$aB ba$ B bB
$aBb ba$
$aB a$ B
$a a$
$ $ accept, successful completion
142
Outputs: S aBa B bB B bB B
Derivation(left-most): S aBa abBa abbBa abba
S
parse tree
a B a
b B
b B
143
PROGRAM → begin DECLIST comma STATELIST
end
DECLIS → d semi DECLIST
DECLIST → d
STATELIST → s semi STATELIST
STATELIST → s
After left factoring, the grammer is changed to
PROGRAM → begin DECLIST comma STATELIST end
DECLIST → dX
X → semi DECLIST | є
STATELIST → sY
Y → semi STATELIST | є
PROGRAM → begin DECLIST comma STATELIST
end
DECLIST → dX
X → semi DECLIST | є
STATELIST → sY
Y → semi STATELIST | є
First(X) = {semi, є} Follow(X) =
{comma}
First(Y) = {semi, є} Follow(Y) = {end}
Write functions for each nonterminal.
main()
{
token = lexical();
PROGRAM();
}
Viod PROGRAM
{
if (token != begin) error();
token = lexical();
DECLIST();
if (token != comma) error();
token = lexical();
STATELIST();
if (token != end) error();
}
void DECLIST()
{
if (token != d) error;
token = lexical();
X();
}
void X()
{
if (token == semi)
{
token = lexical();
DECLIST();
}
else
if (token == comma) ; // do nothing
else error();
}
void STATELIST()
{
if (token != s) error();
token = lexical();
Y();
}
Void Y()
{
if (token == semi)
{
token = lexical();
STATELIST();
}
else
if (token == end) ; // do nothing
else error();
}
PROGRAM → begin DECLIST comma STATELIST
end
DECLIST → dX
X → semi DECLIST | є
STATELIST → sY
Y → semi STATELIST | є
Change productions into an extended notation that
includes the *.
PROGRAM → begin DECLIST comma STATELIST
end
DECLIST → d (semi d)*
STATELIST → s (semi s)*
void DECLIST()
{ if (token != d) error();
token = lexical();
while (token == semi)
{
token = lexical();
if (token != d) error();
token = lexical();
}
}
void STATELIST()
{ if (token != s) error();
token = lexical();
while (token == semi)
{
token = lexical();
if (token != s) error();
token = lexical();
}
}
Removal of recursion is not always possible. A context
free grammar might contain middle recursion and this
can not be replaced by iteration. For example
E→ E ‘+’ T
E→ T
T→ T ‘*’ F
T→ F
F→ ‘(‘ E ‘)’
F→ ‘x’
E→ E ‘+’ T
E→ T
T→ T ‘*’ F
T→ F
Transforming the grammar into LL(1) F→
F→
‘(‘ E ‘)’
‘x’
E → TX
X → ‘ +’ TX | є
T → FY
Y → ‘*’ FY | є
F → ‘(‘ E ‘) | ‘x’
Replacing recursion by iteration, where possible, we
have
E → T( ‘+’ T)*
T → F(‘*’ F)*
F → ‘(‘ E ‘)’ | ‘x’
void E()
E → T( ‘+’ T)*
{
T(); T → F(‘*’ F)*
while (token == plus) F → ‘(‘ E ‘)’ | ‘x’
{
token = lexical();
T();
}
}
Void T()
{
F();
while (token == Times)
{
token = lexical();
F();
}
}
Void F()
{ E → T( ‘+’ T)*
if (token == obracket) T → F(‘*’ F)*
{
token = lexical(); F → ‘(‘ E ‘)’ | ‘x’
E();
if (token == cbracket)
token = lexical();
else
error();
}
else if (token == x)
token = lexical();
else
error();
main()
{
token = lexical(;
E();
}