Unit III Syntax Analysis
Top down Parsing
• TDP with Backtracking
Brute Forcing
• TDP without backtracking
Recursive Descent Parser
Predictive Parser (LL(1), LL(k))
Left Recursion
• A production of the grammar is said to be left recursive if the leftmost variable of its RHS is same as
variable of its LHS
e.g. S->Sa | €
• It is considered to be problematic for Top-down parser.
• Therefore, left recursion has to be eliminated from the grammar.
• Left recursion is eliminated by converting the grammar into a right recursion grammar.
• Let’s see general form of the left recursive grammar
A-> A α | β
Left recursive grammar then we can eliminate left recursion by replacing the pair of productions with
A-> β A’
A’-> α A’ | €
Right Recursion
• A production of the grammar is said to be right recursive if the rightmost
variable of its RHS is same as variable of its LHS
e.g. S->aS | €
• Right recursion does not create any problem for top down parser.
• Therefore, there is no need to eliminate the right recursion from the
grammar.
Left Recursion
• Steps to eliminate the left recursion from the grammar:
1. Identify left recursion in the productions
A->A α | β
2. Create new non terminal i.e. A’
3. Rewrite the production as
A->β A’
A’-> α A’| €
4. Final grammar
A->β A’
A’-> α A’| €
Note:
This right recursive grammar is equivalent to left recursive grammar so, to eliminate left recursion we will
follow above rules for every left recursive grammar
Left Recursion
Left Factoring
• The grammar transformation that is useful for producing a grammar
suitable for predictive or top down parsing.
• When the choice between two alternative productions is not clear then we
have to make the decision until enough of the input has been seen that we
can make the right choice of production.
• Sometimes, it is not clear with production to choose/ to produce or
expand a non terminal because multiple productions begin with same
terminal or non terminal or lookahead.
• So this type of grammar is called as non deterministic grammar or
grammar containing left factoring.
Left Factoring
• Let’s take general form of left factoring
A-> α β1 | α β2 | α β3 | €
• Having α as common prefix. Suppose we have to derive α β3 then we
have to backtrack in the tree.
• If we have large no. of production and produce some string so we have to
backtracking which is very time consuming process.
• This backtracking is held because of common prefix in the one or more
productions on the RHS.
Left Factoring
• Let’s take general form of left factoring
A-> α β1 | α β2 | α β3 | €
The conversion of non-deterministic grammar to the deterministic grammar by eliminating left
factoring
Take common prefix
A-> α A’
A’-> β1 | β2 | β3
We have postponed the choosing of production or decision.
A
α A’
β3
It directly produces β3 without backtracking.
• The procedure used to convert non-deterministic grammar to deterministic grammar is
called as left factoring
Left Factoring
• INPUT: Grammar G
• OUTPUT: Am equivalent left –factored grammar
First and Follow functions
• The construction of both top down and bottom up parsing are aided by two functions FIRST and FOLLOW
associated with the grammar G.
• FIRST ():
A set of terminal variables from the grammar that begins in the strings from production rule of grammar G.
Let’s consider example, S-> α Aaβ A->cϒ
FIRST (S)= α
FIRST (A)= c
But if S->α Aaβ | € and A->cϒ
Then, FIRST (S) ={α, €} i.e. if s-> € then FIRST(S)= FIRST(A) along with α
• FOLLOW ():
A set of terminal that can appear immediately at the right of any non terminal or terminal variable in some
sentinel form i.e. the set of terminal such that there exists a derivation of the form
S-> α Aaβ for some α and β
Note that, there may have been symbols between A and a, at time during the derivation, but if so,
they derived € and disappeared. In addition , If A can be the rightmost symbol the $ is in FOLLOW(A)
FOLLOW of any start symbol is always $.
First and Follow functions
Algorithm for FIRST function
First and Follow functions
Algorithm for FOLLOW function
LL(1) Grammar
• Predictive parsers that is recursive descent parser needing no backtracking is constructed for a class of grammar
called as LL(1).
• The first “L” in LL(1) stands for scanning the input from left to right.
• The second “L” stands for producing a leftmost derivation and 1 stands for using one input symbol i.e. lookahead at
each step to make parsing decision.
Construction of LL(1) Parsing table
Rules:
• To construct parsing table, place non-terminals vertically and terminals horizontally.
• Check each non terminal produces which production rule to produce terminal and place it in the cell of terminal fo
that non terminal.
• Blanks are error entries. No blanks indicate a production with which to expand a non terminal.
LL(1) Grammar
Consider example,
Production FIRST FOLLOW
E-> T E’ id, ( $, )
E’-> + T E’ | € +, € $, )
T-> F T’ id, ( +, $, )
T’-> * F T’ | € *, € +, $, )
F-> ( E ) | id id, ( $, *,+, )
LL(1) Grammar
LL(1) Parsing Table/ Predictive Parsing table for above grammar is,
Algorithm for construction of LL(1) parsing table or Predictive parsing table
Top Down Parsing
• Creating the parse tree for input string starting from root and creating the nodes of the parse tree in depth first.
• It is equivalently finding out the leftmost derivation for input string for top down parsing
• Consider the following grammar
E-> TE’
E’-> +TE’| €
T-> FT’
T’-> *FT’|€
F-> (E) |id
• This sequence of trees corresponds to a leftmost derivation of the input.
• A predictive parser is chosen between E’ production by looking at the next symbol.
• At each step of the top down parse tree the important problem is that of determining the production to be applied
for a non terminal.
• Once a production is chosen, the rest of the parsing process consists of “matching” of the terminal symbols in
production body with input string.
Predictive Recursive Descent Parsing
• It is called as recursive descent parsing which is considered as a top down method of syntax analysis in which a
set of recursive procedures are used to process the input once the procedure is associated with each nonterminal
of grammar.
• Simple form of recursive descent parsing is called as predictive parsing in which the lookahead symbol
unambiguously determines the flow of control through the procedure body for each nonterminal.
• The sequence of procedure calls during the analysis of an input string implicitly defines parse tree for the input
• It can be used to build the desired parse tree explicitly.
• Predictive parsing relies on the information of first symbols that can be generated by the production body.
Let α be the string of grammar symbols (terminal or non terminal), define FIRST(α) to be the set of terminal that
appears as the first symbols of one or more strings of terminals generated from α .
If α is € or can generate € , then € is also in FIRST(α)
• We shall just use ad hoc reasoning to deduce the symbols in FIRST(α) , typically α will either begin with a
termina, which is therefore the only symbol in FIRST(α) or α will begin with non terinal whose production
bodies begin with terminals in which these terminal are the only members of FIRST(α)
Predictive Recursive Descent Parsing
Let α be the string of grammar symbols (terminal or non terminal), define FIRST(α) to be the set of terminal that
appears as the first symbols of one or more strings of terminals generated from α .
If α is € or can generate € , then € is also in FIRST(α)
• We shall just use ad hoc reasoning to deduce the symbols in FIRST(α) , typically α will either begin with a
terminal, which is therefore the only symbol in FIRST(α)
or α will begin with non terminal whose production bodies begin with terminals in which these terminal are the
only members of FIRST(α).
• The FIRST sets must be considered if there are two productions A-> α and A-> β .
• Ignoring € productions for the moment, predictive parsing requires FIRST(α) and FIRST(β) are two disjoint
sets.
• The lookahead symbol can then be used to decide which production to use.
• If the lookahead symbol is in FIRST(α) , then α is used. Otherwise lookahead symbol is in FIRST(β) , then β is
used.
Basic steps to construct RD parser
• A Parser that uses collection of recursive procedures for parsing the given input string called as recursive descent
parser.
• In this type of parser, CFG used to build recursive routines.
• The RHS of the production rule is directly converted to a program.
• For each non terminal a separate procedure is written and body of the procedure is RHS of corresponding non
terminal.
• The RHS of rule is directly converted into a program code symbol by symbol.
1. If the input symbol is non terminal then a call to the procedure corresponding to non terminal is made.
2. If the input symbol is terminal then it is matched with the look ahead from the input. The lookahead pointer has
to be advanced on matching of the input symbol
3. If production rule has many alternates then all these alternates has to be combined into a single body of
procedure.
4. The parser should be activated by a procedure corresponding to the start symbol.
Basic steps to construct RD parser
Let’s see with this example E -> i E’ E’ -> + i E’ | €
E()
{
if(l==‘i’)
{
Match (‘i’);
E’();
}
}
E’()
{
if(l==‘+’)
{
Match(‘+’);
Match(‘i’);
E’();
}
}
Match ()
{
if(l==t)
l= getchar();
else
printf(“Error”);
}
main()
{
E();
if(l==‘$’)
printf(“Success/ accepted”);
}
Non –Recursive Predictive parser (LL(1) Parsing)
• A non recursive Predictive Parser can be built by maintaining the stack explicitly rather than implicitly via
recursive calls
• These parser uses leftmost derivation.
• If w is the input that has been matched so far, then stack holds a sequence of grammar symbols α such that
*
S -> w α
lm
Model of non-recursive parsing/ table driven predictive parsing
Non –Recursive Predictive parser (LL(1) Parsing)
• The table driven parser in the figure has input buffer, as tack containing a sequence of grammar symbols, a
parsing table constructed by using the algorithm for construction of parsing table and output stream.
• The input buffer contains the string to be parsed, followed by the end marker $.
• We reuse the symbol $ to mark the bottom of the stack, which initially contains start symbol of the grammar
on the top of $.
• The parser is controlled by the program that considers X, the symbol on the top of the stack, and a the current
input symbol.
• If X is nonterminal, the parser chooses an X-production by consulting entry M[ X, a ] of the parsing table M.
(Additional code to be executed here, for example, code to construct a node in parse tree)
• Otherwise, it checks for a match between the terminal X and current input symbol a
• The behaviour of the parser can be described in terms of its configurations, which give the stack contents and
the remaining input.
Algorithm for table driven parsing
Non –Recursive Predictive parser (LL(1) Parsing)
Example to form non recursive descent parse tree (Predictive Parse tree)
• Let’s consider the production
E-> T E’
E’-> + T E’ | €
T-> F T’
T’-> * F T’ | €
F-> ( E ) | id
We have already created parse tree for the above grammar.
On input id + id * id, the non recursive predictive parser of above algorithm makes sequence of moves in the following
table using leftmost derivations of above grammar.
Moves made by predictive parser on input id+id*id
Example to form non recursive descent parse tree (Predictive Parse
tree)
• Let’s consider the production S->( S ) | €
FIRST FOLLOW
S->( S ) | € (,€ $, )
Parsing table for above grammar is
Nonterminal / Terminal ( ) $
S S->( S ) S-> € S-> €
Let’s derive the input string as “(( ))”
So in stack $ is always situated at the bottom of the stack as shown below
Example to form non recursive descent parse tree (Predictive Parse
tree)
Example to form non recursive descent parse tree (Predictive Parse
tree)
Example to form non recursive parse tree