0% found this document useful (0 votes)

11 views60 pages

9 - Syntax Analysis

The document covers error handling in compilers, focusing on syntax-directed translation and recursive descent parsing. It discusses various types of errors, methods for error recovery, and the construction of abstract syntax trees (ASTs) using semantic actions. The lecture also introduces recursive descent parsing and provides examples of how to implement it.

Uploaded by

Trường Trịnh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views60 pages

9 - Syntax Analysis

Uploaded by

Trường Trịnh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

Error Handling

Syntax-Directed Translation
Recursive Descent Parsing

CS143
Lecture 6

Instructor: Fredrik Kjolstad

Slide design by Prof. Alex Aiken, with modifications
1
Announcements

• PA1 & WA1

– Due today at midnight

• PA2 & WA2

– Assigned today

2
Outline

• Extensions of CFG for parsing

– Precedence declarations
– Error handling
– Semantic actions

• Constructing an abstract syntax tree (AST)

• Recursive descent

3
Error Handling

• Purpose of the compiler is

– To detect non-valid programs
– To translate the valid ones
• Many kinds of possible errors

Error kind Example (C) Detected by …

Lexical …$… Lexer
Syntax … x *% … Parser
Semantic … int x; y = x(3); … Type checker
Correctness your favorite program Tester/User

4
Syntax Error Handling

• Error handler should

– Report errors accurately and clearly
– Recover from an error quickly
– Not slow down compilation of valid code

• Good error handling is not easy to achieve

5
Approaches to Syntax Error Recovery

• From simple to complex

– Panic mode
– Error productions
– Automatic local or global correction

• Not all are supported by all parser generators

6
Error Recovery: Panic Mode

• Simplest, most popular method

• When an error is detected:

– Discard tokens until one with a clear role is found
– Continue from there

• Such tokens are called synchronizing tokens

– Typically the statement or expression terminators

7
Syntax Error Recovery: Panic Mode (Cont.)

• Consider the erroneous expression

(1 + + 2) + 3
• Panic-mode recovery:
– Skip ahead to next integer and then continue

• Bison: use the special terminal error to describe

how much input to skip
E → int | E + E | ( E ) | error int | ( error )

8
Syntax Error Recovery: Error Productions

• Idea: specify in the grammar known common mistakes

• Essentially promotes common errors to alternative syntax

• Example:
– Write 5 x instead of 5 * x
– Add the production E → … | E E

• Disadvantage
– Complicates the grammar

9
Error Recovery: Local and Global Correction

• Idea: find a correct “nearby” program

– Try token insertions and deletions
– Exhaustive search

• Disadvantages:
– Hard to implement
– Slows down parsing of correct programs
– “Nearby” is not necessarily “the intended” program
– Not all tools support it

10
Syntax Error Recovery: Past and Present

• Past
– Slow recompilation cycle (even once a day)
– Find as many errors in one cycle as possible
– Researchers could not let go of the topic

• Present
– Quick recompilation cycle
– Users tend to correct one error/cycle
– Complex error recovery is less compelling
– Panic-mode seems enough

11
Abstract Syntax Trees

• So far a parser traces the derivation of a

sequence of tokens

• The rest of the compiler needs a structural

representation of the program

• Abstract syntax trees

– Like parse trees but ignore some details
– Abbreviated as AST

12
Abstract Syntax Tree (Cont.)

• Consider the grammar

E → int | ( E ) | E + E

• And the string

5 + (2 + 3)

• After lexical analysis (a list of tokens)

int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’

• During parsing we build a parse tree …

13
Example of Parse Tree

E
• Traces the operation of
E + E the parser

• Does capture the

int5 ( E ) nesting structure

+ • But too much info

E E – Parentheses
– Single-successor nodes

int2 int3

14
Example of Abstract Syntax Tree

PLUS

5 2 3

• Also captures the nesting structure

• But abstracts from the concrete syntax
=> more compact and easier to use
• An important data structure in a compiler
15
CFG: Semantic Actions

• This is what we’ll use to construct ASTs

• Each grammar symbol may have attributes

– For terminal symbols (lexical tokens) attributes can be
calculated by the lexer

• Each production may have an action

– Written as: X → Y1 … Yn { action }
– That can refer to or compute symbol attributes

16
Semantic Actions: An Example

• Consider the grammar

E → int | E + E | ( E )

• For each symbol X define an attribute X.val

– For terminals, val is the associated lexeme
– For non-terminals, val is the expression’s value (and is computed
from values of subexpressions)

• We annotate the grammar with actions:

E → int { E.val = int.val }
| E 1 + E2 { E.val = E1.val + E2.val }
| ( E1 ) { E.val = E1.val }

17
Semantic Actions: An Example (Cont.)

• String: 5 + (2 + 3)
• Tokens: int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’

Productions Equations
E → E1 + E2 E.val = E1.val + E2.val
E1 → int5 E1.val = int5.val = 5
E2 → ( E3) E2.val = E3.val
E3 → E4 + E5 E3.val = E4.val + E5.val
E4 → int2 E4.val = int2.val = 2
E5 → int3 E5.val = int3.val = 3

18
Semantic Actions: Notes

• Semantic actions specify a system of equations

• Declarative Style
– Order of resolution is not specified
– The parser figures it out

• Imperative Style
– The order of evaluation is fixed
– Important if the actions manipulate global state

19
Semantic Actions: Notes

• We’ll explore actions as pure equations

– Style 1
– But note bison has a fixed order of evaluation for
actions

• Example:
E3.val = E4.val + E5.val
– Must compute E4.val and E5.val before E3.val
– We say that E3.val depends on E4.val and E5.val

20
Dependency Graph

E + • Each node labeled E has

one slot for the val
E1 + E2 attribute
• Note the dependencies
int5 5 ( E3 + )

E4 +
E5

int2 2 int3 3

21
Evaluating Attributes

• An attribute must be computed after all its

successors in the dependency graph have been
computed
– In previous example attributes can be computed
bottom-up

• Such an order exists when there are no cycles

– Cyclically defined attributes are not legal

22
Dependency Graph

E 10

E1 5 + E2 5

int5 5 ( E3 5 )

E4 2 + 3
E5

int2 2 int3 3

23
Semantic Actions: Notes (Cont.)

• Synthesized attributes
– Calculated from attributes of descendents in the parse
tree
– E.val is a synthesized attribute
– Can always be calculated in a bottom-up order

• Grammars with only synthesized attributes are

called S-attributed grammars
– Most common case

24
Inherited Attributes

• Another kind of attribute

• Calculated from attributes of parent and/or

siblings in the parse tree

• Example: a line calculator

25
A Line Calculator

• Each line contains an expression

E → int | E + E
• Each line is terminated with the = sign
L→E= | +E=

• In second form the value of previous line is used

as starting value
• A program is a sequence of lines
P→ ε|PL

26
Attributes for the Line Calculator

• Each E has a synthesized attribute val

– Calculated as before
• Each L has an attribute val
L→E= { L.val = E.val }
| +E= { L.val = E.val + L.prev }

• We need the value of the previous line

• We use an inherited attribute L.prev

27
Attributes for the Line Calculator (Cont.)

• Each P has a synthesized attribute val

– The value of its last line
P→ε { P.val = 0 }
| P1 L { L.prev = P1.val;
P.val = L.val }
– Each L has an inherited attribute prev
– L.prev is inherited from sibling P1.val

• Example …

28
Example of Inherited Attributes

P • val synthesized

P L +

• prev inherited
+ E3 + =
ε 0

+ • All can be
E4 E5 computed in
bottom-up order
int2 2 int3 3

29
Example of Inherited Attributes

P 5 • val synthesized

P 0 0 L 5

• prev inherited
+ E3 5 =
ε 0

+ • All can be
E4 2 E5 3 computed in
depth-first order
int2 2 int3 3

30
Semantic Actions: Notes (Cont.)

• Semantic actions can be used to build ASTs

• And many other things as well

– Also used for type checking, code generation,
computation, …

• Process is called syntax-directed translation

– Substantial generalization over CFGs

31
Constructing an AST

• We first define the AST data type

– Supplied by us for the project
• Consider an abstract tree type with two constructors:

mkleaf(n) = n

PLUS
mkplus( , ) =

T1 T2 T1 T2

32
Constructing an AST

• We define a synthesized attribute ast

– Values of ast values are ASTs
– We assume that int.lexval is the value of the integer
lexeme
– Computed using semantic actions

E → int E.ast = mkleaf(int.lexval)

| E 1 + E2 E.ast = mkplus(E1.ast, E2.ast)
| ( E1 ) E.ast = E1.ast

33
Abstract Syntax Tree Example

• Consider the string int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’
• A bottom-up evaluation of the ast attribute:
E.ast = mkplus(mkleaf(5),
mkplus(mkleaf(2), mkleaf(3))

PLUS

5 2 3

34
Summary

• We can specify language syntax using CFG

• A parser will answer whether s ∈ L(G)

– … and will trace a parse tree
– … in whose productions we build an AST
– … that we pass on to the rest of the compiler

35
Intro to Top-Down Parsing: The Idea

• The parse tree is constructed 1

– From the top
– From left to right t2 3 t9