0% found this document useful (0 votes)
36 views12 pages

Lecture 3 Compiler Design

Prof. Dr. Markus Mock (University of Pittsburgh)

Uploaded by

Atul Mathur
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views12 pages

Lecture 3 Compiler Design

Prof. Dr. Markus Mock (University of Pittsburgh)

Uploaded by

Atul Mathur
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Syntax Analysis

CS2210
Lecture 4

CS2210 Compiler Design 2004/05

Parser
source lexical analyzer token parser get next token symbol table parse tree rest of frontend IR

Parsing = determining whether a string of tokens can be generated by a grammar


CS2210 Compiler Design 2004/05

Grammars
!

Precise, easy-to understand description of syntax Context-free grammars -> efficient parsers (automatically!) Help in translation and error detection
!

Eg. Attribute grammars Can add new constructs systematically


CS2210 Compiler Design 2004/05

Easier language evolution


!

Syntax Errors
!

Many errors are syntactic or exposed by parsing


!

eg. Unbalanced () Report errors quickly & accurately Recover quickly (continue parsing after error) Little overhead on parse time
CS2210 Compiler Design 2004/05

Error handling goals:


! !

Error Recovery
!

Panic mode
!

Discard tokens until synchronization token found (often ;)

Phrase level
!

Local correction: replace a token by another and continue Encode commonly expected errors in grammar Find closest input string that is in L(G)
!

Error productions
!

Global correction
!

Too costly in practice

CS2210 Compiler Design 2004/05

Context-free Grammars
!

! !

Precise and easy way to specify the syntactical structure of a programming language Efficient recognition methods exist Natural specification of many recursive constructs:
!

expr -> expr + expr | term


CS2210 Compiler Design 2004/05

Context-free Grammar Definition


!

Terminals T
!

Symbols which form strings of L(G), G a CFG (= tokens in the scanner), e.g. if , else , id Syntactic variables denoting sets of strings of L(G) Impose hierarchical structure (e.g., precedence rules) Denotes the set of strings of L(G) Rules that determine how strings are formed N -> (N|T) *
CS2210 Compiler Design 2004/05

Nonterminals N
! !

Start symbol S (! N)
!

Productions P
! !

Example: Expression Grammar


expr -> expr op expr expr -> (expr) expr -> - expr expr -> id op -> + op -> op -> * op -> / op -> ^
!

Terminals:
!

{id, +, -, *, /, ^} {expr, op,} Expr

Nonterminals
!

Start symbol
!

CS2210 Compiler Design 2004/05

Notational Conventions
!

Terminals
! ! ! ! !

Nonterminals
! !

a,b,c.. +,-,.. ,.; etc 0..9 expr or <expr>

A, B, C .. S start symbol (if present) or first nonterminal in production list u,v,..

Terminal strings
!

Grammar symbol strings


!

",# A -> "

Productions
!

CS2210 Compiler Design 2004/05

Shorthands & Derivations


E -> E + E | E * E | (E) | - E | <id>
!

! !

E => - E E derives -E => derives in 1 step =>* derive in n (0..) steps

CS2210 Compiler Design 2004/05

More Definitions
! !

! ! ! !

L(G) language generated by G = set of strings derived from S S =>+ w : w sentence of G (w string of terminals) S =>+ " : " sentential form of G (string can contain nonterminals) G and G are equivalent :$ L(G) = L(G) A language generated by a grammar (of the form shown) is called a context-free language
CS2210 Compiler Design 2004/05

Example
G = ({-,*,(,),<id>}, {E}, E, {E -> E + E, E-> E * E , E -> (E) , E-> - E, E -> <id>})
Sentence: -(<id> + <id>) Derivation: E => -E => -(E) => -(E+E)=>-(<id>+E) => -(<id> + <id>)

Leftmost derivation i.e. always replace leftmost nonterminal Rightmost derivation analogously Left /right sentential form

CS2210 Compiler Design 2004/05

Parse Trees
E E => -E => -(E) => -(E+E)=> -(<id>+E) => -(<id> + <id>) ( E <id> Parse tree = graphical representation of a derivation ignoring replacement order E E + ) E <id>

CS2210 Compiler Design 2004/05

Ambiguous Grammars
!

>=2 different parse trees for some sentence $ >= 2 leftmost/rightmost derivations Usually want to have unambiguous grammars
!

E.g. want to just one evaluation order: <id> + <id> * <id> to be parsed as <id> + (<id> * <id>) not (<id>+<id>)*<id> To keep grammars simple accept ambiguity and resolve separately (outside of grammar)

CS2210 Compiler Design 2004/05

Expressive Power
!

CFGs are more powerful than REs


! !

Can express matching () with CFGs Can express most properties desired for programming languages Identifiers declared before used L = {wcw|w is in (a|b) *} Parameter checking (#formals = #actuals) L ={a nbmcndm|n % 1, m % 1}

CFGs cannot express:


!

CS2210 Compiler Design 2004/05

Eliminating Ambiguity (1)


Grammar stmt -> if expr then stmt | if expr then stmt else stmt | other is ambiguous: Sentence: if E1 then if E2 then S1 else S2

stmt => if expr then stmt => if E1 then stmt => if E1 then if expr then stmt else stmt => if E1 then if E2 then stmt else stmt => if E1 then if E2 then S1 else stmt => if E1 then if E2 then S1 else S 2

stmt => if expr then stmt else stmt => if E1 then stmt else stmt => if E1 then if expr then stmt else stmt => if E1 then if E2 then stmt else stmt => if E1 then if E2 then S1 else stmt => if E1 then if E2 then S1 else S 2

Which one do we prefer?

CS2210 Compiler Design 2004/05

Eliminating Ambiguity (2)


Grammar stmt -> if expr then stmt | if expr then stmt else stmt | other is ambiguous: Sentence: if E1 then if E2 then S1 else S2
stmt -> matchted_stmt | unmatched_stmt matched_stmt -> if expr then matched_stmt else matched_stmt | other unmatched_stmt -> if expr then stmt | if expr then matched_stmt else unmatched_stmt

CS2210 Compiler Design 2004/05

Left Recursion
If for grammar G there is a derivation A =>+ A", for some string " then G is left recursive Example: S -> Aa | b A -> Ac | Sd | &
!

CS2210 Compiler Design 2004/05

Parsing
!

= determining whether a string of tokens can be generated by a grammar Two classes based on order in which parse tree is constructed:
!

Top-down parsing
!

Start construction at root of parse tree Start at leaves and proceed to root
CS2210 Compiler Design 2004/05

Bottom-up parsing
!

Recursive Descent Parsing


!

A top-down method based on recursive procedures (one for each nonterminal typically)
!

May have to backtrack when wrong production was picked

Predictive parsing = a recursive descent parsing approach that avoids backtracking


! !

More efficient Uses (limited) lookahead to decide what productions to use


CS2210 Compiler Design 2004/05

Predictive Parser
!

Program with a (parsing) procedure for each nonterminal which


!

Decides what production to use (based on lookahead in the input) Uses a production by mimicking the right side

CS2210 Compiler Design 2004/05

Predictive Parser Example


type -> simple | ^id | array [simple ] of type simple -> integer | char | num dotdot num
procedure match(t:token); begin if lookahead = t then lookahead = nexttoken; else error; end; procedure type; begin if lookahead is in {integer,char,num) then simple else if lookakead = ^ then begin match(^);match(id) end else if lookahead = array then begin match(array);match([); simple; match(]);match(of); type end else error; end

CS2210 Compiler Design 2004/05

Predictive Parsing Obstacles


!

expr -> expr + term


! !

expr; match(+); term; Infinite recursion (left recursion)

stmt -> if expr then stmt else stmt | if expr then stmt
!

Common prefix
!

Cant predict production

Solution
! !

Eliminate left recursion Left factoring


CS2210 Compiler Design 2004/05

Eliminating Left Recursion (1)


!

Simple case: immediate left recursion: Replace A -> A " | # with A -> # A A -> "A | &

CS2210 Compiler Design 2004/05

Eliminating Left Recursion (2)


Order the nonterminals A 1 .. A n for i := 1 to n do begin for j := 1 to i-1 do begin replace each production of the form Ai -> Aj' by the productions Ai -> (1' | ( 2' || (k' where A i -> (1 | (2 | | (k are all current A j productions end eliminate immediate left recursion among the A i productions end
CS2210 Compiler Design 2004/05

Example Eliminating Left Recursion


S -> Aa | b A -> Ac | Sd | & Order: S,A
for i := 1 to n do begin for j := 1 to i-1 do begin replace each production of the form Ai -> A j' by the productions Ai -> (1' | (2' || (k' where Ai -> (1 | (2 | | (k are all current A j productions end eliminate immediate left recursion among the A i productions end

i=2,j=1: Eliminate A->S ' Replace A->Sd with A->Ac|Aad|bd|&


Eliminate immediate left recursion: S->Aa|b A -> bdA|A A ->cA | adA |

&

CS2210 Compiler Design 2004/05

Left Factoring
!

Find longest common prefix and turn into new nonterminal


! !

stmt -> if expr then stmt stmt stmt -> else stmt | &

CS2210 Compiler Design 2004/05

Transition Diagrams
! !

Create initial and final state For each production A -> X1X2Xn create a path from the initial to the final state, with edges labeled X1, X2, Xn
0 T + 3 & 6

E:

CS2210 Compiler Design 2004/05

Non-recursive Predictive Parsers


! !

Avoid recursion for efficiency reasons Typically built automatically by tools


Input X Y Z $ a + b $
Predictive Parsing Program

Stack

output M[A,a] gives production A symbol on stack a input symbol (and $)

Parsing Table M

CS2210 Compiler Design 2004/05

Parsing Algorithm
!

X symbol on top of stack, a current input symbol


!

Stack contents and remaining input called parser configuration (initially $S on stack and complete input string)
If X=a=$ halt and announce success If X=a ) $ pop X off stack advance input to next symbol If X is a nonterminal use M[X,a] which contains production X->rhs or error replace X on stack with rhs or call error routine, respectively, e.g. X->UVW replace X with WVU (U on top) output the production (or augment parse tree)
CS2210 Compiler Design 2004/05

1. 2. 3.

10

Construction of Parsing Table Helpers (1)


!

First(") : =set of terminals that begin strings derived from "


! ! !

First(X) = {X} for terminal X If X-> & a production add & to First(X) For X->Y1Yk place a in First(X) if a in First(Y i) and & !First(Yj) for j=1i-1, if & !First(Yj) j=1k add & to First(X)

CS2210 Compiler Design 2004/05

Construction of Parsing Table Helpers (2)


!

Follow(A) := set of terminals a that can appear immediately to the right of A in some sentential form i.e., S =>* " Aa # for some ",# (a can include $)
! !

Place $ in Follow(S), S start symbol, $ right end marker If there is a production A-> " B# put everything in First( #) except & in Follow(B) If there is a production A-> " B or A-> "B # where & is in First( #) then everything in Follow(A) is in Follow(B)

CS2210 Compiler Design 2004/05

Construction Algorithm
Input: Grammar G Output: Parsing table M For each production A -> " do For each terminal a in FIRST( ") add A-> " to M[A, a] If & is in FIRST( ") add A-> " to M[A,b] for each terminal b in FOLLOW(A). ($ counts as a terminal in this step) Make each undefined entry in M to error
CS2210 Compiler Design 2004/05

11

Example
E -> TE E -> +TE | & T ->FT T -> *FT | & F -> (E) | id FIRST(E) = FIRST(T) = FIRST(F) ={(,id } FIRST(E) = {+, &} FIRST(T) = {*, &} FOLLOW(E)=FOLLOW(E)={),$} FOLLOW(T)=FOLLOW(T)={+.),$} FOLLOW(F) ={+.*,),$} I + d

* (

E E T T F

CS2210 Compiler Design 2004/05

LL(1)
!

A grammar whose parsing table has no multiply defined entries is said to be LL(1)
! ! !

First L = left to right input scanning Second L = leftmost derivation (1) = 1 token lookahead

Not all grammars can be brought to LL(1) form, i.e., there are languages that do not fall into the LL(1) class

CS2210 Compiler Design 2004/05

12

You might also like