Error Handling
Syntax-Directed Translation
Recursive Descent Parsing
CS143
Lecture 6
Instructor: Fredrik Kjolstad
Slide design by Prof. Alex Aiken, with modifications
1
Announcements
• PA1 & WA1
– Due today at midnight
• PA2 & WA2
– Assigned today
2
Outline
• Extensions of CFG for parsing
– Precedence declarations
– Error handling
– Semantic actions
• Constructing an abstract syntax tree (AST)
• Recursive descent
3
Error Handling
• Purpose of the compiler is
– To detect non-valid programs
– To translate the valid ones
• Many kinds of possible errors
Error kind Example (C) Detected by …
Lexical …$… Lexer
Syntax … x *% … Parser
Semantic … int x; y = x(3); … Type checker
Correctness your favorite program Tester/User
4
Syntax Error Handling
• Error handler should
– Report errors accurately and clearly
– Recover from an error quickly
– Not slow down compilation of valid code
• Good error handling is not easy to achieve
5
Approaches to Syntax Error Recovery
• From simple to complex
– Panic mode
– Error productions
– Automatic local or global correction
• Not all are supported by all parser generators
6
Error Recovery: Panic Mode
• Simplest, most popular method
• When an error is detected:
– Discard tokens until one with a clear role is found
– Continue from there
• Such tokens are called synchronizing tokens
– Typically the statement or expression terminators
7
Syntax Error Recovery: Panic Mode (Cont.)
• Consider the erroneous expression
(1 + + 2) + 3
• Panic-mode recovery:
– Skip ahead to next integer and then continue
• Bison: use the special terminal error to describe
how much input to skip
E → int | E + E | ( E ) | error int | ( error )
8
Syntax Error Recovery: Error Productions
• Idea: specify in the grammar known common mistakes
• Essentially promotes common errors to alternative syntax
• Example:
– Write 5 x instead of 5 * x
– Add the production E → … | E E
• Disadvantage
– Complicates the grammar
9
Error Recovery: Local and Global Correction
• Idea: find a correct “nearby” program
– Try token insertions and deletions
– Exhaustive search
• Disadvantages:
– Hard to implement
– Slows down parsing of correct programs
– “Nearby” is not necessarily “the intended” program
– Not all tools support it
10
Syntax Error Recovery: Past and Present
• Past
– Slow recompilation cycle (even once a day)
– Find as many errors in one cycle as possible
– Researchers could not let go of the topic
• Present
– Quick recompilation cycle
– Users tend to correct one error/cycle
– Complex error recovery is less compelling
– Panic-mode seems enough
11
Abstract Syntax Trees
• So far a parser traces the derivation of a
sequence of tokens
• The rest of the compiler needs a structural
representation of the program
• Abstract syntax trees
– Like parse trees but ignore some details
– Abbreviated as AST
12
Abstract Syntax Tree (Cont.)
• Consider the grammar
E → int | ( E ) | E + E
• And the string
5 + (2 + 3)
• After lexical analysis (a list of tokens)
int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’
• During parsing we build a parse tree …
13
Example of Parse Tree
E
• Traces the operation of
E + E the parser
• Does capture the
int5 ( E ) nesting structure
+ • But too much info
E E – Parentheses
– Single-successor nodes
int2 int3
14
Example of Abstract Syntax Tree
PLUS
PLUS
5 2 3
• Also captures the nesting structure
• But abstracts from the concrete syntax
=> more compact and easier to use
• An important data structure in a compiler
15
CFG: Semantic Actions
• This is what we’ll use to construct ASTs
• Each grammar symbol may have attributes
– For terminal symbols (lexical tokens) attributes can be
calculated by the lexer
• Each production may have an action
– Written as: X → Y1 … Yn { action }
– That can refer to or compute symbol attributes
16
Semantic Actions: An Example
• Consider the grammar
E → int | E + E | ( E )
• For each symbol X define an attribute X.val
– For terminals, val is the associated lexeme
– For non-terminals, val is the expression’s value (and is computed
from values of subexpressions)
• We annotate the grammar with actions:
E → int { E.val = int.val }
| E 1 + E2 { E.val = E1.val + E2.val }
| ( E1 ) { E.val = E1.val }
17
Semantic Actions: An Example (Cont.)
• String: 5 + (2 + 3)
• Tokens: int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’
Productions Equations
E → E1 + E2 E.val = E1.val + E2.val
E1 → int5 E1.val = int5.val = 5
E2 → ( E3) E2.val = E3.val
E3 → E4 + E5 E3.val = E4.val + E5.val
E4 → int2 E4.val = int2.val = 2
E5 → int3 E5.val = int3.val = 3
18
Semantic Actions: Notes
• Semantic actions specify a system of equations
• Declarative Style
– Order of resolution is not specified
– The parser figures it out
• Imperative Style
– The order of evaluation is fixed
– Important if the actions manipulate global state
19
Semantic Actions: Notes
• We’ll explore actions as pure equations
– Style 1
– But note bison has a fixed order of evaluation for
actions
• Example:
E3.val = E4.val + E5.val
– Must compute E4.val and E5.val before E3.val
– We say that E3.val depends on E4.val and E5.val
20
Dependency Graph
E + • Each node labeled E has
one slot for the val
E1 + E2 attribute
• Note the dependencies
int5 5 ( E3 + )
E4 +
E5
int2 2 int3 3
21
Evaluating Attributes
• An attribute must be computed after all its
successors in the dependency graph have been
computed
– In previous example attributes can be computed
bottom-up
• Such an order exists when there are no cycles
– Cyclically defined attributes are not legal
22
Dependency Graph
E 10
E1 5 + E2 5
int5 5 ( E3 5 )
E4 2 + 3
E5
int2 2 int3 3
23
Semantic Actions: Notes (Cont.)
• Synthesized attributes
– Calculated from attributes of descendents in the parse
tree
– E.val is a synthesized attribute
– Can always be calculated in a bottom-up order
• Grammars with only synthesized attributes are
called S-attributed grammars
– Most common case
24
Inherited Attributes
• Another kind of attribute
• Calculated from attributes of parent and/or
siblings in the parse tree
• Example: a line calculator
25
A Line Calculator
• Each line contains an expression
E → int | E + E
• Each line is terminated with the = sign
L→E= | +E=
• In second form the value of previous line is used
as starting value
• A program is a sequence of lines
P→ ε|PL
26
Attributes for the Line Calculator
• Each E has a synthesized attribute val
– Calculated as before
• Each L has an attribute val
L→E= { L.val = E.val }
| +E= { L.val = E.val + L.prev }
• We need the value of the previous line
• We use an inherited attribute L.prev
27
Attributes for the Line Calculator (Cont.)
• Each P has a synthesized attribute val
– The value of its last line
P→ε { P.val = 0 }
| P1 L { L.prev = P1.val;
P.val = L.val }
– Each L has an inherited attribute prev
– L.prev is inherited from sibling P1.val
• Example …
28
Example of Inherited Attributes
P • val synthesized
P L +
• prev inherited
+ E3 + =
ε 0
+ • All can be
E4 E5 computed in
bottom-up order
int2 2 int3 3
29
Example of Inherited Attributes
P 5 • val synthesized
P 0 0 L 5
• prev inherited
+ E3 5 =
ε 0
+ • All can be
E4 2 E5 3 computed in
depth-first order
int2 2 int3 3
30
Semantic Actions: Notes (Cont.)
• Semantic actions can be used to build ASTs
• And many other things as well
– Also used for type checking, code generation,
computation, …
• Process is called syntax-directed translation
– Substantial generalization over CFGs
31
Constructing an AST
• We first define the AST data type
– Supplied by us for the project
• Consider an abstract tree type with two constructors:
mkleaf(n) = n
PLUS
mkplus( , ) =
T1 T2 T1 T2
32
Constructing an AST
• We define a synthesized attribute ast
– Values of ast values are ASTs
– We assume that int.lexval is the value of the integer
lexeme
– Computed using semantic actions
E → int E.ast = mkleaf(int.lexval)
| E 1 + E2 E.ast = mkplus(E1.ast, E2.ast)
| ( E1 ) E.ast = E1.ast
33
Abstract Syntax Tree Example
• Consider the string int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’
• A bottom-up evaluation of the ast attribute:
E.ast = mkplus(mkleaf(5),
mkplus(mkleaf(2), mkleaf(3))
PLUS
PLUS
5 2 3
34
Summary
• We can specify language syntax using CFG
• A parser will answer whether s ∈ L(G)
– … and will trace a parse tree
– … in whose productions we build an AST
– … that we pass on to the rest of the compiler
35
Intro to Top-Down Parsing: The Idea
• The parse tree is constructed 1
– From the top
– From left to right t2 3 t9
• Terminals are seen in order of 4 7
appearance in the token stream:
t2 t5 t6 t8 t9 t5 t6 t8
36
Recursive Descent Parsing
• Consider the grammar
E → T |T + E
T → int | int * T | ( E )
• Token stream is: ( int5 )
• Start with top-level non-terminal E
– Try the rules for E in order
37
Recursive Descent Parsing
E → T |T + E
T → int | int * T | ( E )
( int5 )
38
Recursive Descent Parsing
E → T |T + E
T → int | int * T | ( E )
( int5 )
39
Recursive Descent Parsing
E → T |T + E
T → int | int * T | ( E )
Mismatch: int is not ( !
int Backtrack …
( int5 )
40
Recursive Descent Parsing
E → T |T + E
T → int | int * T | ( E )
( int5 )
41
Recursive Descent Parsing
E → T |T + E
T → int | int * T | ( E )
Mismatch: int is not ( !
int * T Backtrack …
( int5 )
42
Recursive Descent Parsing
E → T |T + E
T → int | int * T | ( E )
( int5 )
43
Recursive Descent Parsing
E → T |T + E
T → int | int * T | ( E )
T
Match! Advance input.
( E )
( int5 )
44
Recursive Descent Parsing
E → T |T + E
T → int | int * T | ( E )
( E )
( int5 )
45
Recursive Descent Parsing
E → T |T + E
T → int | int * T | ( E )
( E )
( int5 ) T
46
Recursive Descent Parsing
E → T |T + E
T → int | int * T | ( E )
T
Match! Advance input.
( E )
( int5 ) T
int 47
Recursive Descent Parsing
E → T |T + E
T → int | int * T | ( E )
T
Match! Advance input.
( E )
( int5 ) T
int 48
Recursive Descent Parsing
E → T |T + E
T → int | int * T | ( E )
T
End of input, accept.
( E )
( int5 ) T
int 49
A Recursive Descent Parser. Preliminaries
• Let TOKEN be the type of tokens
– Special tokens INT, OPEN, CLOSE, PLUS, TIMES
• Let the global next point to the next token
50
A (Limited) Recursive Descent Parser (2)
• Define boolean functions that check the token
string for a match of
– A given token terminal
bool term(TOKEN tok) { return *next++ == tok; }
– The nth production of S:
bool Sn() { … }
– Try all productions of S:
bool S() { … }
51
A (Limited) Recursive Descent Parser (3)
• For production E → T
bool E1() { return T(); }
• For production E → T + E
bool E2() { return T() && term(PLUS) && E(); }
• For all productions of E (with backtracking)
bool E() {
TOKEN *save = next;
return (next = save, E1())
|| (next = save, E2()); }
52
A (Limited) Recursive Descent Parser (4)
• Functions for non-terminal T
bool T1() { return term(INT); }
bool T2() { return term(INT) && term(TIMES) && T(); }
bool T3() { return term(OPEN) && E() && term(CLOSE); }
bool T() {
TOKEN *save = next;
return (next = save, T1()
|| (next = save, T2())
|| (next = save, T3()); }
53
Recursive Descent Parsing. Notes.
• To start the parser
– Initialize next to point to first token
– Invoke E()
• Notice how this simulates the example parse
• Easy to implement by hand
– But not completely general
– Cannot backtrack once a production is successful
– Works for grammars where at most one production can succeed
for a non-terminal
54
Example
E → T |T + E ( int )
T → int | int * T | ( E )
bool term(TOKEN tok) { return *next++ == tok; }
bool E1() { return T(); }
bool E2() { return T() && term(PLUS) && E(); }
bool E() {TOKEN *save = next; return (next = save, E1())
|| (next = save, E2()); }
bool T1() { return term(INT); }
bool T2() { return term(INT) && term(TIMES) && T(); }
bool T3() { return term(OPEN) && E() && term(CLOSE); }
bool T() { TOKEN *save = next; return (next = save, T1())
|| (next = save, T2())
|| (next = save, T3()); }
55
When Recursive Descent Does Not Work
• Consider a production S → S a
bool S1() { return S() && term(a); }
bool S() { return S1(); }
• S() goes into an infinite loop
• A left-recursive grammar has a non-terminal S
S →+ Sα for some α
• Recursive descent does not work in such cases
56
Elimination of Left Recursion
• Consider the left-recursive grammar
S→Sα|β
• S generates all strings starting with a β and
followed by a number of α
• Can rewrite using right-recursion
S → β S’
S’ → α S’ | ε
57
More Elimination of Left-Recursion
• In general
S → S α1 | … | S αn | β1 | … | βm
• All strings derived from S start with one of
β1,…,βm and continue with several instances of
α1,…,αn
• Rewrite as
S → β1 S’ | … | βm S’
S’ → α1 S’ | … | αn S’ | ε
58
General Left Recursion
• The grammar
S→Aα|δ
A→Sβ
is also left-recursive because
S →+ S β α
• This left-recursion can also be eliminated
• See Dragon Book for general algorithm
– Section 4.3
59
Summary of Recursive Descent
• Simple and general parsing strategy
– Left-recursion must be eliminated first
– … but that can be done automatically
• Historically unpopular because of backtracking
– Was thought to be too inefficient
– In practice, fast and simple on modern machines
• In practice, backtracking is eliminated by
restricting the grammar
60