Cd-Unit-2 Part-2i
Cd-Unit-2 Part-2i
BOTTOM-UP PARSING
Constructing a parse tree for an input string beginning at the leaves and going towards
the root is called bottom-up parsing.
A general type of bottom-up parser is a shift-reduce parser.
abbcde (A → b) S → aABe
aAbcde (A → Abc) → aAde
aAde (B → d) → aAbcde
aABe (S → aABe) → abbcde
S
Handles:
A handle of a string is a substring that matches the right side of a production, and whose
reduction to the non-terminal on the left side of the production represents one step along the reverse
of a rightmost derivation.
Example:
Consider the grammar:
E → E+E
E → E*E
E → (E)
E → id
$E +id2*id3 $ shift
$ E+ id2*id3 $ shift
$ E+E*E $ reduce by E→ E *E
$E $ accept
2. Reduce-reduce conflict: The parser cannot decide which of several reductions to make.
1. Shift-reduce conflict:
Example:
$E $E
2. Reduce-reduce conflict:
Consider the grammar:
M → R+R | R+c |
RR→c
and input c+c
Stack Input Action Stack Input Action
$ c+c $ Shift $ c+c $ Shift
$c +c $ Reduce by $c +c $ Reduce by
R→c R→c
$R +c $ Shift $R +c $ Shift
$ R+ c$ Shift $ R+ c$ Shift
OPERATOR-PRECEDENCE PARSING
Operator precedence parser can be constructed from a grammar called Operator-grammar. These
grammars have the property that no production on right side is ɛ or has two adjacent non-terminals.
Example:
Since the right side EAE has three consecutive non-terminals, the grammar can be written as
follows:
E → E+E | E-E | E*E | E/E | E↑E | -E | id
E → E+E | E-E | E*E | E/E | E↑E | (E) | -E | id is given in the following table assuming
Example:
Consider the grammar E → E+E | E-E | E*E | E/E | E↑E | (E) | id. Input string is id+id*id .The
implementation is as follows:
Advantages of LR parsing:
It recognizes virtually all programming language constructs for which CFG can be
written.
It is an efficient non-backtracking shift-reduce parsing method.
A grammar that can be parsed using LR method is a proper superset of a grammar that
can be parsed with predictive parser.
It detects a syntactic error as soon as possible.
Drawbacks of LR method:
It is too much of work to construct a LR parser by hand for a programming language
grammar. A specialized tool, called a LR parser generator, is needed. Example: YACC.
INPUT
a1 … ai … an $
STACK
It consists of : an input, an output, a stack, a driver program, and a parsing table that has two
parts (action and goto).
The parsing program reads characters from an input buffer one at a time.
The program uses a stack to store a string of the form s0X1s1X2s2…Xms m, where sm is on
top. Each Xi is a grammar symbol and each si is a state.
The parsing table consists of two parts : action and goto functions.
Action : The parsing program determines s m, the state currently on top of stack, and ai, the
current input symbol. It then consults action[sm,ai] in the action table which can have one of four
values :
1. shift s, where s is a state,
2. reduce by a grammar production A → β,
3. accept, and
4. error.
Goto : The function goto takes a state and grammar symbol as arguments and produces a state.
LR Parsing algorithm:
Input: An input string w and an LR parsing table with functions action and goto for grammar G.
Method: Initially, the parser has s0 on its stack, where s0 is the initial state, and w$ in the input
buffer. The parser then executes the following program :
set ip to point to the first input symbol of
w$; repeat forever begin
let s be the state on top of the stack
and a the symbol pointed to by ip;
if action[s, a] = shift s’ then begin push
a then s’ on top of the stack;
advance ip to the next input symbol
end
else if action[s, a] = reduce A→β then begin
pop 2* | β | symbols off the stack;
let s’ be the state now on top of the stack;
push A then goto[s’, A] on top of the
stack; output the production A→ β
end
else if action[s, a] = accept then
return
else error( )
end
CONSTRUCTING SLR(1) PARSING TABLE:
LR(0) items:
An LR(0) item of a grammar G is a production of G with a dot at some position of the
right side. For example, production A → XYZ yields the four items :
A → . XYZ It is used to indicate that how much of the input has been scanned up to the given
point.
A → X . YZ
A → XY . Z
A → XYZ .
Closure operation:
If I is a set of items for a grammar G, then closure(I) is the set of items constructed from
I by the two rules:
1. Initially, every item in I is added to closure(I).
2. If A → α . Bβ is in closure(I) and B → γ is a production, then add the item B → .
γ to I , if it is not already there. We apply this rule until no more new items can be
added to closure(I).
Goto operation:
Goto(I, X) is defined to be the closure of the set of all items [A→ αX . β] such
that [A→ α . Xβ] is in I.
Steps to construct SLR parsing table for grammar G are:
If any conflicting actions are generated by the above rules, we say grammar is not SLR(1).
3. The goto transitions for state i are constructed for all non-terminals A using the rule: If
goto(Ii,A) = Ij, then goto[i,A] = j.
4. All entries not defined by rules (2) and (3) are made “error”
5. The initial state of the parser is the one constructed from the set of items containing
[S’→.S].
I0 : E’ → . E
E→ . E + T
E→ . T
T →.T*F
T →.F
F → . (E)
F → . id
GOTO ( I0 , E) GOTO ( I4 , id )
I1 : E’ → E . I5 : F → id .
E→E.+T
GOTO ( I6 , T )
GOTO ( I0 , T) I9 : E → E + T .
I2 : E → T . T→T.*F
T→T .*F
GOTO ( I6 , F )
GOTO ( I0 , F) I3 : T → F .
I3 : T → F .
GOTO ( I6 , ( )
I4 : F → ( . E )
GOTO ( I4 , T) GOTO ( I9 , *)
I2 : E →T . I7 : T → T * . F
T→T .*F F→.(E)
F → . id
GOTO ( I4 , F)
I3 : T → F .
GOTO ( I4 , ( )
I4 : F → ( . E)
E→ . E + T
E→ . T
T→ . T * F
T→ . F
F → . (E)
F → id
FOLLOW (E) = { $ , ) , +)
FOLLOW (T) = { $ , + , ) , * }
FOOLOW (F) = { * , + , ) , $ }
ACTION GOTO
id + * ( ) $ E T F
I0 s5 s4 1 2 3
I1 s6 ACC
I2 r2 s7 r2 r2
I3 r4 r4 r4 r4
I4 s5 s4 8 2 3
I5 r6 r6 r6 r6
I6 s5 s4 9 3
I7 s5 s4 10
I8 s6 s11
I9 r1 s7 r1 r1
I10 r3 r3 r3 r3
I11 r5 r5 r5 r5
Blank entries are error entries.
Stack implementation:
1. Write the Context free Grammar for the given input string
5. Draw DFA
7. Based on the information from the Table, with help of Stack and
parsing algorithm generate the output.
LR (1) item
The LR (1) item is defined by production, position of data and a terminal symbol.
The terminal is called as look ahead symbol.
A-> . , FIRST(,$)
I0 State:
Add Augment production and compute the Closure, the look ahead symbol for the Augment Production
is $.
S`->•S, $= Closure(S`->•S, $)
The dot symbol is followed by a Non terminal S. So, add productions starting with S in
I0 State.
S->•CC, FIRST ($), using 2 nd rule
S->•CC, $
The dot symbol is followed by a Non terminal C. So, add productions starting with C in
I0 State.
C->•cC, FIRST(C,
$) C->•d, FIRST(C,
$)
FIRST(C) = {c, d} so, the items are
C->•cC, c/d
C->•d, c/d
The dot symbol is followed by a terminal value. So, close the I0 State. So, the productions in
the I0 are
S`->•S , $
S->•CC ,
$
C->•cC, c/d
C->•d , c/d
I1 = Goto ( I0, S)= S`->S•,$
S-> C->•cC , $
C->•d,$
So, the I2 State is
S->C•C,$
C->•cC , $
C->•d,$
C->c•C, c/d
C->•cC, c/d
C->•d , c/d
I4= Goto( I0, d)= Colsure( C->d•, c/d) = C->d•, c/d
S`->S•,$
S->CC•, $
S I1 C I5 C->cC• , $
0 S`->•S , $ C->c•C , $ I9
S->C•C,$
1 S->•CC , $ C->•cC , $ C->•cC , $
C c c
2C- C->•d,$ C->•d,$
>•cC,c/d d I6
3 C->•d
I2 I6 I7
I0 c d
d
C->d•, $
C->c•C, c/d
C->d•, C->•cC, c/d
c/d C->•d , c/d
C I7
I4
d I3 c
C->cC•,
I4 I3 c/d
I8
Construction of CLR (1) Table
Rule1: if there is an item [A->α•Xβ,b] in Ii and goto(Ii,X) is in Ij then action [Ii][X]= Shift
j, Where X is Terminal.
Rule2: if there is an item [A->α•, b] in Ii and (A≠S`) set action [Ii][b]= reduce along with
the production number.
Rule3: if there is an item [S`->S•, $] in Ii then set action [Ii][$]= Accept.
Rule4: if there is an item [A->α•Xβ,b] in Ii and go to(Ii,X) is in Ij then goto [Ii][X]= j,
Where X is Non Terminal.
LR (1) Table
The CLR Parser avoids the conflicts in the parse table. But it produces more number of
States when compared to SLR parser. Hence more space is occupied by the table in the memory.
So LALR parsing can be used. Here, the tables obtained are smaller than CLR parse table. But,
it is also as efficient as CLR parser. Here LR (1) items that have same productions but different
look-aheads are combined to form a single set of items.
For example, consider the grammar in the previous example. Consider the states I4 and I7
as given below:
I4= Goto( I0, d)= Colsure( C->d•, c/d) = C->d•, c/d
I7 = Go to (I2 , d)= Closure(C->d•,$ ) = C->d•, $
These states are differing only in the look-aheads. They have the same productions.
Hence these states are combined to form a single state called as I47.
Similarly the states I3 and I6 differing only in their look-aheads as given below:
I3= Goto(I0,c)=
C->c•C, c/d
C->•cC, c/d
C->•d , c/d
I6= Goto ( I2, c)=
C->c•C , $
C->•cC , $
C->•d,$
These states are differing only in the look-aheads. They have the same productions.
Hence these states are combined to form a single state called as I36.
Similarly the States I8 and I9 differing only in look-aheads. Hence they combined to form
the state I89.
LALR Table
Conflicts in the CLR (1) Parsing
When, multiple entries occur in the table. Then, the situation is said to be a Conflict.
Shift Reduce Conflict in the CLR (1) parsing occurs when a state has
3. A Reduced item of the form A α•, a and
4. An incomplete item of the form A β•aα as shown below:
1 A-> β•a α ,
States Action GOTO
$ a
Ij a $ A B
2 B->b• ,a
Ii Sj/r2
Ii
Reduce- Reduce Conflict in the CLR (1) parsing occurs when a state has two or more
reduced items of the form
3. A α•
4. B ȕ• If two productions in a state (I) reducing on same look ahead symbol
as shown below:
1 A-> α• ,a
States Action GOTO
2 B->β•,a
a $ A B
Ii r1/r2
Ii
String Acceptance using LR Parsing:
Consider the above example, if the input String is cdd
States ACTION GOTO
c D $ S C
I0 S3 S4 1 2
I1 ACCEPT
I2 S6 S7 5
I3 S3 S4 8
I4 R3 R3 5
I5 R1
I6 S6 S7 9
I7 R3
I8 R2 R2
I9 R2
$0 cdd$ Shift S3
$0c3 dd$ Shift S4
$0c3d4 d$ Reduce with R3,C->d, pop
2*β symbols from the stack
$0c3C d$ Goto ( I3, C)=8Shift S6
$0c3C8 d$ Reduce with R2 ,C->cC, pop
2*β symbols from the stack
$0C d$ Goto ( I0, C)=2
$0C2 d$ Shift S7
$0C2d7 $ Reduce with R3,C->d, pop
2*β symbols from the stack
$0C2C $ Goto ( I2, C)=5
$0C2C5 $ Reduce with R 1,S->CC, pop
2*β symbols from the stack
$0S $ Goto ( I0, S)=1
$0S1 $ Accept
LL Parsers vs LR Parsers:
LL starts with only the root nonterminal on the stack, LR ends with only the root nonterminal
on the stack.
LL ends when the stack is empty. But, LR starts with an empty stack.
LL uses the stack for designating what is still to be expected, LR uses the stack for
designating what is already seen.
LL builds the parse tree top down. But, LR builds the parse tree bottom up.
LL continuously pops a nonterminal off the stack, and pushes a corresponding right hand side.
But, LR tries to recognize a right hand side on the stack, pops it, and pushes the corresponding
nonterminal.
LL reads terminal when it pops one off the stack, LR reads terminals while it pushes them on
the stack.
LL uses grammar rules in an order which corresponds to pre-order traversal of the parse tree,
LR does a post-order traversal.