0% found this document useful (0 votes)
12 views

02 Chapter 03

Chapter 3 covers the syntax and semantics of programming languages, detailing the importance of clear definitions for valid programs. It introduces formal methods such as BNF and EBNF for describing syntax, as well as the concept of attribute grammars for static semantics. The chapter also discusses the challenges of ambiguity in grammars and techniques to resolve them.

Uploaded by

Lucky John
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

02 Chapter 03

Chapter 3 covers the syntax and semantics of programming languages, detailing the importance of clear definitions for valid programs. It introduces formal methods such as BNF and EBNF for describing syntax, as well as the concept of attribute grammars for static semantics. The chapter also discusses the challenges of ambiguity in grammars and techniques to resolve them.

Uploaded by

Lucky John
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 38

Chapter 3

Describing Syntax and


Semantics

ISBN 0-321-19362-8
Syntax and Semantics

Learning Outcomes:
By the end of this module you will be able to:
1. describe the terms syntax and semantics of a programming
languages
2. describe and use BNF for describing PL syntax.
3. explain the differences among BNF, EBNF and attribute
grammars
4. outline techniques for describing dynamic semantics of
programming languages

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-2


Topics

• Introduction
• The General Problem of Describing Syntax
• Formal Methods of Describing Syntax
• Attribute Grammars
• Describing the Meanings of Programs:
Dynamic Semantics

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-3


Motivation
• What happens when we attempt compiling this program?

1 class 5A{
2 static final int n = 3;
3 public static void main(int x){
4 if x > 0
5 n += x;
6 else n /= x;
7 }
8 }

• A compiler is based on clear rules that define a language’s


syntax and semantics

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-4


Introduction
• Every programming language must have a precise,
understandable definition on what makes a valid and
meaningful program
• Who must use language definitions?
– Other language designers
– Implementors
– Programmers (the users of the language)
• Syntax - the form or structure of the expressions,
statements, and program units
• Semantics - the meaning of the expressions,
statements, and program units
• Form of a statement should suggest what the statement
does
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-5
Syntax
• A language’s syntax can be low-level or high-level

• Low-level syntax
– for low-level language constructs like identifiers, numbers etc
– described using regular grammars, regular expressions or
syntax diagrams
– basis for lexers

• High-level syntax
– for language constructs like expressions, statements etc
– described using BNF/CFG (to be discussed shortly)
– basis for parsers

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-6


Components of a Language
• An alphabet: a set of characters from which all elements
of the language are constructed
– What is Java’s alphabet set?

• A lexeme (word): a sequence of alphabets


– Examples: *, sum, class

• A token: a category of lexemes


– Examples: identifier, literal, operator, reserved word

• A sentence: a set of lexemes/words


– Example: sum = sum + 1

• A program: a set of sentences

• A language: a set of programs/sentences


Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-7
Describing Syntax
• The syntax rules in a language specify which sentences
belong to the language (i.e., are valid)

• There are two distinct approaches for describing syntax


formally:
1. Recognizers - used in compilers (we will look at in Chapter 4)
• Determine whether an input sentence is valid (belongs to the
language) or not
• Example: Parser

2. Generators – generate the sentences of a language (what we'll


study in this chapter)
• Determine whether the syntax of a particular statement is
correct by comparing it with the structure of a language
generator
• Example: Grammars

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-8


Describing Syntax using Grammars
• Context-Free Grammars (CFGs)
– Developed by Noam Chomsky in the mid-1950s
– Language generators, meant to describe the syntax of natural
languages

• Backus-Naur Form (BNF) (1959)


– Invented by John Backus to describe Algol 58
– BNF is equivalent to context-free grammars

• BNF is now the most popular method for concisely


describing syntax of programming languages

• A BNF is a metalanguage, that is a language used to


describe another language.

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-9


Describing Syntax using BNF Grammars
• BNF uses abstractions to represent classes of syntactic
structures
– These abstractions are called nonterminal symbols

<while_stmt>  while ( <logic_expr> ) <stmt>

• This is a rule that describes the structure of a while


statement

• Lexemes and tokens in the rule are called terminal symbols

• A rule has a left-hand side (LHS) and a right-hand side


(RHS)

• A grammar is a finite nonempty set of rules


Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-10
Describing Syntactic Lists
• An nonterminal symbol can have more than one RHS

<stmt>  <single_stmt>
| begin <stmt_list> end

• Syntactic lists are described using recursion

<ident_list>  ident
| ident, <ident_list>

• Notice that this rule is recursive; the LHS occurs in its


RHS

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-11


Describing Syntax of a Toy Language

<program>  <stmts> L1
<stmts>  <stmt>
| <stmt> ; <stmts>
<stmt>  <var> = <expr>
<var>  a L5
| b
| c
| d
<expr>  <term> + <term> L9
| <term> - <term>
<term>  <var>
| const L12

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-12


BNF Grammars and Derivation
• A derivation is a repeated application of rules, starting
with the start symbol and ending with a sentence

• Example derivation for a = b + const using the BNF


of the preceding slide

<program> => <stmts> => <stmt>


=> <var> = <expr>
=> a = <expr>
=> a = <term> + <term>
=> a = <var> + <term>
=> a = b + <term>
=> a = b + const
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-13
Some Terminology
• Every string of symbols (i.e., each line) in the
derivation is a sentential form

• A sentence is a sentential form that has only terminal


symbols

• A leftmost derivation is one in which the leftmost


nonterminal in each sentential form is the one that is
expanded

• A derivation may be neither leftmost nor rightmost

• Derivation order has no effect on the language


generated by a grammar
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-14
Parse Trees (PTs)
• Grammars define a hierarchical
syntactic structure of the sentences
of the language they define

• These hierarchical structures are


called parse trees

• Notice that:
– Every internal node is labeled
with a nonterminal
– Every leaf is labeled with a
terminal

• Compilers general code based on


the parse tree

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-15


Ambiguity in BNF Grammars
• A grammar is ambiguous iff it generates a sentential form that has
two or more distinct parse trees

• That is an ambiguous grammar generates two or more PTs for the


same sentence

• Equivalently, a grammar is ambiguous if there are two or more


different left-most (or right-most) derivations for the same sentence

• Sources of ambiguity
1. Operator precedence
• A grammar that does not specify enough hierarchical structure of
operator precedence is ambiguous
2. Operator associativity
• A grammar that does not specify enough hierarchical structure of
operator associativity is ambiguous

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-16


Derivation with an Ambiguous Grammar
• Why is the following BNF ambiguous?

<expr>  <expr> <op> <expr>


| const
<op>  / | -

• Derive the sentence const-const/const from this grammar

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-17


Removing Ambiguity in the BNF
• Remove ambiguity by indicating precedence levels of the
operators in this grammar

• Impose more structure on the PT by providing separate


rules for the operands whose operators have different
precedences
– This rule applies for all ambiguity cases due to operator
precedence
– New rules require introducing new nonterminals

• Thus, the preceding BNF becomes:

<expr>  <expr> - <term> | <term>


<term>  <term> / const | const

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-18


Example Derivation
• Derivation for const-const/const revisited

<expr> => <expr> - <term> => <term> - <term>


=> const - <term>
=> const - <term> / const
=> const - const / const

• The unique PT is: <expr>

<expr> - <term>

<term> <term> / const

const const

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-19


Another Ambiguous Grammar
• The following grammar is ambiguous due to operator associativity
– Associativity of “+” is not specified

<expr> -> <expr> + <expr> | const (ambiguous)

• Specify associativity to remove ambiguity:

<expr> -> <expr> + const | const (unambiguous)

<expr>
<expr>

<expr> + const

<expr> + const

const
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-20
Discussion
• Two different derivations can have the same parse tree. T/F? Explain.

• Consider the grammar:


<A> --> <A> ++ <C> | <C>
<C> --> <C> ** <B> | <B>
<B> --> (<A>) | x | y | z

• Which operator has the highest precedence?

• What is the associativity of ++ and **?

• Modify the above grammar by adding an operator, --, with different


precedence than ++ and **. The precedence of -- is midway between
++ and **. This new operator is right associative.

• Show the correctness of the modified grammar by deriving and


drawing the parse tree for x—y**z++y

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-21


Extended BNF (EBNF)
• EBNF adds some notational tools to BNF to make
syntax specification more concise

• Three common extensions to BNF are:


1. Put optional parts in brackets ([ ]):
<method_call> -> ident ( [<expr_list>]);
2. Put alternative parts of RHSs in parentheses and separate them
with vertical bars:
<term> -> <term> (+ | -) const
3. Put repetitions (0 or more) in braces ({ }):
<ident> -> letter {letter | digit}

• The symbols [,],(,),{,} are metasymbols (symbols used


to describe other symbols) not terminal symbols

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-22


BNF and EBNF
• BNF:
<expr>  <expr> + <term>
| <expr> - <term>
| <term>
<term>  <term> * <factor>
| <term> / <factor>
| <factor>
• EBNF:
<expr>  <term> {(+ | -) <term>}
<term>  <factor> {(* | /) <factor>}

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-23


BNF and EBNF (cont’d)
• The six lines in the BNF are now transformed
into two lines

• However, note the following differences


– The BNF rules enforce the associativity of + and –
– The EBNF rules do not imply the direction of
associativity
• Fixed by ensuring that the syntax analysis process is
designed to enforce associativity (see Chapter 4)

• Note that other variations of EBNF exist


– E.g., the one used to describe Java syntax in the Java
Language Specification book

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-24


Introduction to Attribute Grammars
• Semantics of programming languages can be categorized into static
and dynamic

• Static semantics
– Deals with legal forms of programs
– Described using attribute grammars

• Dynamic semantics
– Deals with what programs mean or what they do
– Described using operational, axiomatic and denotational means

• Typical examples of attributes


– Type of a variable
– Value of an expression
– Scope of a variable
– Memory location of a variable

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-25


Why Attribute Grammars?
• Some characteristics of the structure of PLs are difficult to describe
using BNFs
– E.g., type compatibility

• Other characteristics are impossible to describe


– E.g., the rule that a variable must be declared before use

• These are language category rules called static semantic rules


– Described using Attribute Grammars

• Attribute grammars carry some semantic info along through parse


trees

• Attribute grammars are used for static semantics specification and in


compiler design (static semantics checking)

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-26


Attribute Grammars (Knuth, 1968)
• An attribute grammar is a CFG G = (S, N, T, P)
with the following additions:
1. For each grammar symbol x (terminal or
nonterminal) there is a set A(x) of attribute values
• A(x) consists of two disjoint subsets: S(x) and I(x)

2. Each rule has a set of functions that define certain


attributes of the nonterminals in the rule

3. Each rule has a (possibly empty) set of predicates to


check for attribute consistency

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-27


Attribute Grammars
• Let X0  X1 ... Xn be a rule

• Functions of the form S(X0) = f(A(X1), ... , A(Xn)) define


synthesized attributes
– Value of a synthesized attribute of a node depends only on the
attributes of its children

• Functions of the form I(Xj) = f(A(X0), ... , A(Xn)), for 1 <=


j <= n, define inherited attributes
– Value of an inherited attribute of a node depends on the attributes
it’s parent and those of the node’s siblings
– Why is it good to restrict I(Xj) to the following?
• I(Xj) = f(A(X0), ... , A(Xj-1))

• Initially, there are intrinsic attributes on the leaves


Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-28
Attribute Grammmars: Example 1
• The following attribute grammar is used to describe the following rule in
Ada:
The name following the end keyword in a procedure definition
must match the name of the procedure

• The syntax rule describing a procedure definition is:


<proc_def>  procedure <proc_name>
<proc_body>
end<proc_name>;

• The corresponding syntax rule for the attribute grammar is:


Syntax rule: <proc_def>  procedure <proc_name>[1]
<proc_body>
end<proc_name>[2];
Predicate: <proc_name>[1].string =
<proc_name>[2].string

• Notice that repeated nonterminals are distinguished using subscripts


Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-29
Attribute Grammmars: Example 2
• The following attribute grammar is used to check the type
rules of a simple assignment statement
• The following BNF describes the form of the assignments:
<assign>  <var> = <expr>
<expr>  <var> + <var>
| <var>
<var>  A | B | C
• We are given the following type restrictions:
– The variables can have only int or real type
– Two variables on the RHS may not have the same type
– The type of the expression is real when the types of the operands
are different
– The type of the LHS must match the type of the RHS
• otherwise the assignment is considered invalid
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-30
Example 2: The Attributes
• The attributes of the nonterminals in this
grammar are
– actual_type
• A synthesized for <var> and <expr>
• Used to store the actual type (int or real)
• For a variable, the actual type is intrinsic
• For an expression, the actual type is determined
from the actual types of its children nodes
– expected_type
• An inherited for <expr>
• Determined by the type of the variable on the
LHS of the assignment
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-31
Example 2: The Attribute Grammar
• Syntax rule: <assign>  <var>= <expr>
Semantic rules: <expr>.expected_type  <var>.actual_type

• Syntax rule: <expr>  <var>[2] + <var>[3]


Semantic rules: <expr>.actual_type 
if(<var>[2].actual_type = int) and
(<var>[3].actual_type = int)
then int else real
Predicate: <expr>.actual_type = <expr>.expected_type

• Syntax rule: <expr>  <var>


Semantic rules: <expr>.actual_type  <var>.actual_type
Predicate: <expr>.actual_type = <expr>.expected_type

• Syntax rule: <var>  A | B | C


Semantic rule: <var>.actual_type  lookup (<var>.string)

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-32


Computing Type Attributes for A = A + B

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-33


Dynamic Semantics

• Why study (the difficult task of) dynamic semantics?


1. Programmers need to know precisely what statements of a language do
2. Compiler writers need something more formal than the typical informal
semantics found in prog. language documentation
3. Helps in the theoretical investigation of language properties
– Appreciate other language paradigms

• No single widely acceptable notation or formalism for


describing semantics
1. Operational Semantics
2. Axiomatic semantics
3. Denotational semantics

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-34


Dynamic Semantics
• Operational Semantics
– Uses an abstract machine to describe semantics
– More suitable for language implementers

• Axiomatic semantics
– Uses mathematical logic (independent of a real or abstract machine)
– More suitable for language theorists

• Denotational semantics
– Uses mathematical functions (independent of a real or abstract
machine)
– More suitable for language theorists

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-35


Semantics: Why Prove Correctness?

• Proof of program correctness is needed in many


applications

• Examples: Safety-critical applications


– Autopilot systems
– Medical control systems
– Nuclear plant controllers

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-36


Operational Semantics
• Operational Semantics
– Describe the meaning of a program by executing its statements on a
machine, either simulated or actual.
– The change in the state of the machine (memory, registers, etc.)
defines the meaning of the statement

• The process:
– Build a translator (translates source code to the machine code of an
idealized computer)
– Build a simulator for the idealized computer

• Examples
– VDL used to describe the semantics of PL/I
– SECD machine used to describe the semantics of Lisp

• Evaluation of operational semantics:


– Good if used informally (language manuals, etc.)
– Extremely complex if used formally (e.g., VDL)

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-37


Operational Semantics: Example
• Write the operation semantics of the C for statement:
for(exp1; exp2;exp3)
body
rest
• Operation semantics:
exp1
L1: if exp2 = 0 go to L2
body
expr3
go to L1
L2: rest

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-38

You might also like