0% found this document useful (0 votes)
4 views

Lecture 06

The document covers error handling in compilers, focusing on syntax-directed translation and recursive descent parsing. It discusses various types of errors, recovery methods, and the construction of abstract syntax trees (ASTs) to represent program structure. Additionally, it explains semantic actions for building ASTs and the principles of top-down parsing.

Uploaded by

itsmeshinoo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Lecture 06

The document covers error handling in compilers, focusing on syntax-directed translation and recursive descent parsing. It discusses various types of errors, recovery methods, and the construction of abstract syntax trees (ASTs) to represent program structure. Additionally, it explains semantic actions for building ASTs and the principles of top-down parsing.

Uploaded by

itsmeshinoo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Error Handling

Syntax-Directed Translation
Recursive Descent Parsing

CS143
Lecture 6

Instructor: Fredrik Kjolstad


Slide design by Prof. Alex Aiken, with modifications
1
Announcements

• PA1 & WA1


– Due today at midnight

• PA2 & WA2


– Assigned today

2
Outline

• Extensions of CFG for parsing


– Precedence declarations
– Error handling
– Semantic actions

• Constructing an abstract syntax tree (AST)

• Recursive descent parsing

3
Error Handling

• Purpose of the compiler is


– To detect non-valid programs
– To translate the valid ones
• Many kinds of possible errors

Error kind Example (C) Detected by …


Lexical …$… Lexer
Syntax … x *% … Parser
Semantic … int x; y = x(3); … Type checker
Correctness your favorite program Tester/User

4
Syntax Error Handling

• Error handler should


– Report errors accurately and clearly
– Recover from an error quickly
– Not slow down compilation of valid code

• Good error handling is not easy to achieve

5
Syntax Error Recovery

• Approaches from simple to complex


– Panic mode
– Error productions
– Automatic local or global correction

• Not all are supported by all parser generators

6
Error Recovery: Panic Mode

• Simplest, most popular method

• When an error is detected:


– Discard tokens until one with a clear role is found
– Continue from there

• Such tokens are called synchronizing tokens


– Typically the statement or expression terminators

7
Error Recovery: Panic Mode (Cont.)

• Consider the erroneous expression


(1 + + 2) + 3
• Panic-mode recovery:
– Skip ahead to next integer and then continue

• Bison: use the special terminal error to describe


how much input to skip
E → int | E + E | ( E ) | error int | ( error )

8
Error Recovery: Error Productions

• Idea: specify in the grammar known common mistakes

• Essentially promotes common errors to alternative syntax

• Example:
– Write 5 x instead of 5 * x
– Add the production E → … | E E

• Disadvantage
– Complicates the grammar

9
Error Recovery: Local and Global Correction

• Idea: find a correct “nearby” program


– Try token insertions and deletions
– Exhaustive search

• Disadvantages:
– Hard to implement
– Slows down parsing of correct programs
– “Nearby” is not necessarily “the intended” program
– Not supported by most tools

10
Syntax Error Recovery: Past and Present

• Past
– Slow recompilation cycle (even once a day)
– Find as many errors in one cycle as possible
– Researchers could not let go of the topic

• Present
– Quick recompilation cycle
– Users tend to correct one error/cycle
– Complex error recovery is less compelling
– Panic-mode seems enough

11
Abstract Syntax Trees

• So far a parser traces the derivation of a


sequence of tokens

• The rest of the compiler needs a structural


representation of the program

• Abstract syntax trees


– Like parse trees but ignore some details
– Abbreviated as AST

12
Abstract Syntax Trees (Cont.)

• Consider the grammar


E → int | ( E ) | E + E

• And the string


5 + (2 + 3)

• After lexical analysis (a list of tokens)


int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’

• During parsing we build a parse tree …

13
Example of Parse Tree

E
• Traces the operation of
E + E the parser

• Does capture the


int5 ( E ) nesting structure

+ • But too much info


E E – Parentheses
– Single-successor nodes

int2 int3

14
Example of Abstract Syntax Tree

PLUS

PLUS

5 2 3

• Also captures the nesting structure


• But abstracts from the concrete syntax
=> more compact and easier to use
• An important data structure in a compiler
15
Semantic Actions Extension to CFGs

• This is what we’ll use to construct ASTs

• Each grammar symbol may have attributes


– For terminal symbols (lexical tokens) attributes can be
calculated by the lexer

• Each production may have an action


– Written as X → Y1…Yn { action }
– That can refer to or compute symbol attributes

16
Semantic Actions: Example

• Consider the grammar


E → int | E + E | ( E )

• For each symbol X define an attribute X.val


– For terminals, val is the associated lexeme
– For non-terminals, val is the expression’s value (and is computed
from values of subexpressions)

• We annotate the grammar with actions:


E → int { E.val = int.val }
| E 1 + E2 { E.val = E1.val + E2.val }
| ( E1 ) { E.val = E1.val }

17
Semantic Actions: Example (Cont.)

• String: 5 + (2 + 3)
• Tokens: int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’

Productions Equations
E → E1 + E2 E.val = E1.val + E2.val
E1 → int5 E1.val = int5.val = 5
E2 → (E3) E2.val = E3.val
E3 → E4 + E5 E3.val = E4.val + E5.val
E4 → int2 E4.val = int2.val = 2
E5 → int3 E5.val = int3.val = 3

18
Semantic Actions: Notes

• Semantic actions specify a system of equations

• Declarative Style
– Order of resolution is not specified
– The parser figures it out

• Imperative Style
– The order of evaluation is fixed
– Important if the actions manipulate global state

19
Semantic Actions: Notes

• We’ll explore actions as pure equations


– But note bison has a fixed order of evaluation for
actions

• Example:
E3.val = E4.val + E5.val
– Must compute E4.val and E5.val before E3.val
– We say that E3.val depends on E4.val and E5.val

20
Dependency Graph

E + • Each node labeled E has


one slot for the val
E1 + E2 attribute
• Note the dependencies
int5 5 ( E3 + )

E4 +
E5

int2 2 int3 3

21
Evaluating Attributes

• An attribute must be computed after all its


successors in the dependency graph have been
computed
– In previous example attributes can be computed
bottom-up

• Such an order exists when there are no cycles


– Cyclically defined attributes are not legal

22
Dependency Graph

E 10

E1 5 + E2 5

int5 5 ( E3 5 )

E4 2 + 3
E5

int2 2 int3 3

23
Semantic Actions: Notes (Cont.)

• Synthesized attributes
– Calculated from attributes of descendents in the parse
tree
– E.val is a synthesized attribute
– Can always be calculated in a bottom-up order

• Grammars with only synthesized attributes are


called S-attributed grammars
– Most common case

24
Semantic Actions: Notes (Cont.)

• Semantic actions can be used to build ASTs

• And many other things as well


– Also used for type checking, code generation,
computation, …

• Process is called syntax-directed translation


– Substantial generalization over CFGs

25
Constructing an AST

• We first define the AST data type


– Supplied by us for the project
• Consider an abstract tree type with two constructors:

mkleaf(n) = n

PLUS
mkplus( , ) =

T1 T2 T1 T2

26
Constructing an AST

• We define a synthesized attribute ast


– Values of ast values are ASTs
– We assume that int.lexval is the value of the integer
lexeme
– Computed using semantic actions

E → int E.ast = mkleaf(int.lexval)


| E 1 + E2 E.ast = mkplus(E1.ast, E2.ast)
| ( E1 ) E.ast = E1.ast

27
Abstract Syntax Tree Example

• Consider the string int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’
• A bottom-up evaluation of the ast attribute:
E.ast = mkplus(mkleaf(5),
mkplus(mkleaf(2), mkleaf(3))

PLUS

PLUS

5 2 3

28
Summary

• We can specify language syntax using CFG

• A parser will answer whether s ∈ L(G)


– … and will trace a parse tree
– … in whose productions we build an AST
– … that we pass on to the rest of the compiler

29
Intro to Top-Down Parsing: The Idea

• The parse tree is constructed 1


– From the top
– From left to right t2 3 t9

• Terminals are seen in order of 4 7


appearance in the token stream:
t2 t5 t6 t8 t9 t5 t6 t8

30
Recursive Descent Parsing

• Consider the grammar


E → T |T + E
T → int | int * T | ( E )

• Token stream is: ( int5 )

• Start with top-level non-terminal E


– Try the rules for E in order

31
Recursive Descent Parsing

E → T |T + E
T → int | int * T | ( E )

( int5 )

32
Recursive Descent Parsing

E → T |T + E
T → int | int * T | ( E )

( int5 )

33
Recursive Descent Parsing

E → T |T + E
T → int | int * T | ( E )

Mismatch: int is not ( !


int Backtrack …

( int5 )

34
Recursive Descent Parsing

E → T |T + E
T → int | int * T | ( E )

( int5 )

35
Recursive Descent Parsing

E → T |T + E
T → int | int * T | ( E )

Mismatch: int is not ( !


int * T Backtrack …

( int5 )

36
Recursive Descent Parsing

E → T |T + E
T → int | int * T | ( E )

( int5 )

37
Recursive Descent Parsing

E → T |T + E
T → int | int * T | ( E )

T
Match! Advance input.
( E )

( int5 )

38
Recursive Descent Parsing

E → T |T + E
T → int | int * T | ( E )

( E )

( int5 )

39
Recursive Descent Parsing

E → T |T + E
T → int | int * T | ( E )

( E )

( int5 ) T

40
Recursive Descent Parsing

E → T |T + E
T → int | int * T | ( E )

T
Match! Advance input.
( E )

( int5 ) T

int 41
Recursive Descent Parsing

E → T |T + E
T → int | int * T | ( E )

T
Match! Advance input.
( E )

( int5 ) T

int 42
Recursive Descent Parsing

E → T |T + E
T → int | int * T | ( E )

T
End of input, accept.
( E )

( int5 ) T

int 43
A Recursive Descent Parser: Preliminaries

• Let TOKEN be the type of tokens


– Special tokens INT, OPEN, CLOSE, PLUS, TIMES

• Let the global next point to the next token

44
A (Limited) Recursive Descent Parser (2)

• Define boolean functions that check the token


string for a match of
– A given token terminal
bool term(TOKEN tok) { return *next++ == tok; }
– The nth production of S:
bool Sn() { … }
– Try all productions of S:
bool S() { … }

45
A (Limited) Recursive Descent Parser (3)

• For production E → T
bool E1() { return T(); }

• For production E → T + E
bool E2() { return T() && term(PLUS) && E(); }
• For all productions of E (with backtracking)
bool E() {
TOKEN *save = next;
return (next = save, E1())
|| (next = save, E2()); }

46
A (Limited) Recursive Descent Parser (4)

• Functions for non-terminal T


bool T1() { return term(INT); }
bool T2() { return term(INT) && term(TIMES) && T(); }
bool T3() { return term(OPEN) && E() && term(CLOSE); }

bool T() {
TOKEN *save = next;
return (next = save, T1()
|| (next = save, T2())
|| (next = save, T3()); }

47
Recursive Descent Parsing. Notes.

• To start the parser


– Initialize next to point to first token
– Invoke E()

• Easy to implement by hand


– But not completely general
– Cannot backtrack once a production is successful
– Works for grammars where at most one production can succeed
for a non-terminal

48
Example

E→T|T+E ( int )
T → int | int * T | ( E )

bool term(TOKEN tok) { return *next++ == tok; }

bool E1() { return T(); }


bool E2() { return T() && term(PLUS) && E(); }

bool E() {TOKEN *save = next; return (next = save, E1())


|| (next = save, E2()); }
bool T1() { return term(INT); }
bool T2() { return term(INT) && term(TIMES) && T(); }
bool T3() { return term(OPEN) && E() && term(CLOSE); }

bool T() { TOKEN *save = next; return (next = save, T1())


|| (next = save, T2())
|| (next = save, T3()); }

49
When Recursive Descent Does Not Work

• Consider a production S → S a
bool S1() { return S() && term(a); }
bool S() { return S1(); }

• S() goes into an infinite loop

• A left-recursive grammar has a non-terminal S


S →+ Sα for some α
• Recursive descent does not work in such cases

50
Elimination of Left Recursion

• Consider the left-recursive grammar


S→Sα|β

• S generates all strings starting with a β and


followed by a number of α

• Can rewrite using right-recursion


S → β S’
S’ → α S’ | ε

51
More Elimination of Left-Recursion

• In general
S → S α1 | … | S αn | β1 | … | βm
• All strings derived from S start with one of
β1,…,βm and continue with several instances of
α1,…,αn
• Rewrite as
S → β1 S’ | … | βm S’
S’ → α1 S’ | … | αn S’ | ε

52
General Left Recursion

• The grammar
S→Aα|δ
A→Sβ
is also left-recursive because
S →+ S β α

• This left-recursion can also be eliminated

• See Dragon Book for general algorithm


– Section 4.3

53
Summary of Recursive Descent

• Simple and general parsing strategy


– Left-recursion must be eliminated first
– … but that can be done automatically

• Historically unpopular because of backtracking


– Was thought to be too inefficient
– In practice, with some tweaks, fast and simple on
modern machines

• Backtracking can be controlled by restricting the


grammar
54

You might also like