1
A Simple Syntax-Directed
Translator
Chapter 2
2
Building a Simple Compiler
• Building our compiler involves:
– Defining the syntax of a programming language
– Develop a source code parser: for our compiler
we will use predictive parsing
– Implementing syntax directed translation to
generate intermediate code
– Generating the intermediate code .
– Optimize the intermediate code.
– Target Code Generation
– Code Optimization
3
The Structure of our Compiler
4
Syntax Definition
• Context-free grammar is a 4-tuple with
– A set of tokens (terminal symbols)
– A set of nonterminals
– A set of productions
– A designated start symbol
5
Example Grammar
Context-free grammar for simple expressions:
G = <{list,digit}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, list>
with productions P =
list → list + digit
list → list - digit
list → digit
digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
6
Derivation
• Given a CF grammar we can determine the
set of all strings (sequences of tokens)
generated by the grammar using derivation
– We begin with the start symbol
– In each step, we replace one nonterminal in the
current sentential form with one of the right-
hand sides of a production for that nonterminal
7
Derivation for the Example
Grammar
list
list + digit
list - digit + digit
digit - digit + digit
9 - digit + digit
9 - 5 + digit
9-5+2
This is an example leftmost derivation, because we replaced
the leftmost nonterminal (underlined) in each step.
Likewise, a rightmost derivation replaces the rightmost
nonterminal in each step
8
Parse Trees
• The root of the tree is labeled by the start symbol
• Each leaf of the tree is labeled by a terminal
(=token) or
• Each interior node is labeled by a nonterminal
• If A → X1 X2 … Xn is a production, then node A has
immediate children X1, X2, …, Xn where Xi is a
(non)terminal or ( denotes the empty string)
9
Parse Tree for the Example
Grammar
Parse tree of the string 9-5+2 using grammar G
list
list digit
list digit
digit
The sequence of
9 - 5 + 2 leafs is called the
yield of the parse tree
10
Ambiguity
Consider the following context-free grammar:
G = <{string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string>
with production P =
string → string + string | string - string | 0 | 1 | … | 9
This grammar is ambiguous, because more than one parse tree
represents the string 9-5+2
11
Ambiguity (cont’d)
string string
string string string
string string string string string
9 - 5 + 2 9 - 5 + 2
12
Associativity of Operators
Left-associative operators have left-recursive productions
left → left + term | term
String a+b+c has the same meaning as (a+b)+c
Right-associative operators have right-recursive productions
right → term = right | term
String a=b=c has the same meaning as a=(b=c)
13
Precedence of Operators
Operators with higher precedence “bind more tightly”
expr → expr + term | term
term → term * factor | factor
factor → number | ( expr )
String 2+3*5 has the same meaning as 2+(3*5)
expr
expr term
term term factor
factor factor number
number number
2 + 3 * 5
14
Syntax of Statements
stmt → id := expr
| if expr then stmt
| if expr then stmt else stmt
| while expr do stmt
| begin opt_stmts end
opt_stmts → stmt ; opt_stmts
|
15
Syntax-Directed Translation
• Uses a CF grammar to specify the syntactic
structure of the language
• AND associates a set of attributes with the
terminals and nonterminals of the grammar
• AND associates with each production a set of
semantic rules to compute values of attributes
• A parse tree is traversed and semantic rules
applied: after the tree traversal(s) are completed,
the attribute values on the nonterminals contain
the translated form of the input
16
Synthesized and Inherited
Attributes
• An attribute is said to be …
– synthesized if its value at a parse-tree node is
determined from the attribute values at the
children of the node
– inherited if its value at a parse-tree node is
determined by the parent (by enforcing the
parent’s semantic rules)
17
Example Attribute Grammar
String concat operator
Production Semantic Rule
expr → expr1 + term expr.t := expr1.t // term.t // “+”
expr → expr1 - term expr.t := expr1.t // term.t // “-”
expr → term expr.t := term.t
term → 0 term.t := “0”
term → 1 term.t := “1”
… …
term → 9 term.t := “9”
18
Example Annotated Parse Tree
19
Depth-First Traversals
procedure visit(n : node);
begin
for each child m of n, from left to right do
visit(m);
evaluate semantic rules at node n
end
20
Depth-First Traversals (Example)
expr.t = “95-2+”
expr.t = “95-” term.t = “2”
expr.t = “9” term.t = “5”
term.t = “9”
9 - 5 + 2 Note: all attributes are
of the synthesized type
21
Translation Schemes
• A translation scheme is a CF grammar
embedded with semantic actions
rest → + term { print(“+”) } rest
Embedded
semantic action
rest
+ term { print(“+”) } rest
22
Example Translation Scheme
expr → expr + term { print(“+”) }
expr → expr - term { print(“-”) }
expr → term
term → 0 { print(“0”) }
term → 1 { print(“1”) }
… …
term → 9 { print(“9”) }
23
Example Translation Scheme
(cont’d)
expr
{ print(“+”) }
expr + term
{ print(“2”) }
{ print(“-”) }
- 2
expr term
{ print(“5”) }
term 5
{ print(“9”) }
9
Translates 9-5+2 into postfix 95-2+
24
Parsing Problem
The parsing Problem: Take a string of symbols in a language (tokens)
and a grammar for that language to construct the parse tree or report that
the sentence is syntactically incorrect.
For correct strings:
Sentence + grammar → parse tree
For a compiler, a sentence is a program:
Program + grammar → parse tree
Types of parsers:
Top-down a.k.a predictive (recursive descent parsing)
Bottom-up parsing.
“We will focus on top-down parsing at present”.
25
Top Down Parsing
Recursive Descent parsing uses recursive procedures to model the parse
tree to be constructed. The parse tree is built from the top down, trying to
construct a left-most derivation.
Beginning with start symbol, for each non-terminal (syntactic class) in
the grammar a procedure which parses that syntactic class is constructed.
Consider the expression grammar:
E → T E’
E’ → + T E’ | e
T → F T’
T’ → * F T’ | e
F → ( E ) | id
The following procedures can parse strings top-down in this language:
26
Recursive Descent
Procedure E Procedure T Procedure F
begin { E } begin { T } begin { F }
call T call F case token is
call E’ call T’ “(“:
print (“ E found ”) print (“ T found ”) print (“ ( found ”)
end { E } end { T } Get next token
call E
Procedure E’ Procedure T’ if token = “)” then
begin { E’ } begin { T’ } begin { IF }
If token = “+” then If token = “ * ” then print (“ ) found”)
begin { IF } begin { IF } Get next token
print (“ + found “) print (“ * found “) print (“ F found “)
Get next token Get next token end { IF }
call T call F else
call E’ call T’ call ERROR
end { IF } end { IF } “id“:
print (“ E’ found “) print (“ T’ found “) print (“ id found ”)
end { E’ } end { T’ } Get next token
print (“ F found “)
otherwise:
call ERROR
end { F }
27
Left Recursion & Top-Down
Ambiguity is not the only problem associated with recursive descent parsing.
Other problems to be aware of are left recursion and left factoring:
Left recursion: A grammar is left recursive if it has a non-terminal A such that
there is a derivation A → A for some non-empty string .
A is left-recursive if the left-most symbol in any of its alternatives either immediately
(direct left-recursive) or through other non-terminal definitions (indirect/hidden
left-recursive) rewrites to a string with A on the left.
Top-down parsing methods cannot handle left-recursive grammars,
so a transformation is needed to eliminate left recursion.
28
Prediction and Left Recursion
Immediate left-recursion: A → A
E.g., Expr → Expr + Term
Top-down parser implementation:
function expr() {
expr(); match(‘+’); term();
}
Do you see the problem ?
Indirect left-recursion: A → Ba | C
B → Ab | D
A Ba Aba
29
Removing Left Recursion
• Left recursion is eliminated by converting the grammar into a right recursive
grammar.
•
• If we have the left-recursive pair of productions-
• A → Aα / β
• (Left Recursive Grammar)
• where β does not begin with an A.
•
• Then, we can eliminate left recursion by replacing the pair of productions
with-
• A → βA’
• A’ → αA’ / ∈
• (Right Recursive Grammar)
•
• This right recursive grammar functions same as left recursive grammar.
30
Right Recursive Expressions
• Consider the following grammar and eliminate left recursion-
• E→E+T/T
• T→TxF/F
• F → id
•
• Solution-
• The grammar after eliminating left recursion is-
• E → TE’
• E’ → +TE’ / ∈
• T → FT’
• T’ → xFT’ / ∈
• F → id
31
Syntax Directed Left Rec
Syntax directed translation adds semantic rules to be carried
out when syntactic rules are applied. Let’s do conversion of
infix to postfix.
Expr → Expr + Term {out(“ + “);}
| Expr - Term {out(“ - “);}
| Term
Term →Term * Factor {out(“ * “);}
| Term / Factor {out(“ / “);}
| Factor
Factor → (Expr)
| int {out(“ “, int.val, “ “);}
32
How It Works
Examples of applying previous syntax
directed translation
Input: 15 - 20 + 7 * 3 / 2
Output: 15 20 - 7 3 * 2 / +
Input: 15 - 20 - 7 + 3 * 2
Output: 15 20 - 7 - 3 2 * +
33
Direct Placement of Actions
Expr → Term ExprRest
ExprRest → + Term ExprRest {out (“ + “ );}
| - Term ExprRest {out (“ - “ );}
|
Term → Factor TermRest
TermRest → * Factor TermRest {out(“ * “);}
| / Factor TermRest {out (“ / “ );}
|
Factor → (Expr)
| int {out(“ “,int.val,” “);}
34
Problems Galore
Examples of applying previous syntax
directed translation
Input: 15 - 20 + 7 * 3 / 2
Output: 15 20 7 3 2 / * + - (In error)
Input: 15 - 20 - 7 + 3 * 2
Output: 15 20 7 3 2 * + - - (In error)
35
Treat Actions as Terminals
Expr → Term ExprRest
ExprRest → + Term {out (“ + “ );} ExprRest
| - Term {out (“ - “ );} ExprRest
|
Term → Factor TermRest
TermRest → * Factor {out(“ * “);} TermRest
| * Factor {out(“ / “);} TermRest
|
Factor → (Expr)
| int {out(“ “,int.val,” “);}
36
Top Down Parsing
Recursive Descent parsing uses recursive procedures to model the parse
tree to be constructed. The parse tree is built from the top down, trying to
construct a left-most derivation.
Beginning with start symbol, for each non-terminal (syntactic class) in
the grammar a procedure which parses that syntactic class is constructed.
Consider the expression grammar G = ({E, E’, T, T’, F}, {+,-
,*,/,id}, E,
{ E → T E’
E’ → + T E’ | - T E’ |
T → F T’
T’ → * F T’ | / F T’ |
F → ( E ) | id })
The following procedures have to be written:
37
Recursive Descent
Procedure E Procedure T Procedure F
begin { E } begin { T } begin { F }
call T call F case token is
call E’ call T’ “(“:
end { E } end { T } nextsy()
call E
Procedure E’ Procedure T’ if token = “)” then
begin { E’ } begin { T’ } nextsy()
If token = “+” then If token = “*” then else
begin { addition } begin { multiply } ERROR()
nextsy nextsy() “id“:
call T call F out( id.val )
out(“ + “) out(“ * “) Get next token
call E’ call T’ otherwise:
end { addition } end { multiply } ERROR()
If token = “-” then If token = “/” then end { F }
begin { subtraction } begin { divide }
nextsy nextsy()
call T call F
out(“ - “) out(“ / “)
call E’ call T’
end { subtraction} end { divide }
end { E’ } end { T’ }
38
Process
• Write left recursive grammar with semantic
actions.
• Rewrite as right recursive with actions
treated as terminals in original rules.
• Develop recursive descent parser.
39
Left Factoring
When have rules like
A → |
which rule to choose is a problem
Factor as
A→X
X→|