Compiler Construction
Compiler Construction
COMPILER
CONSTRUCTION
For T.Y.B.Sc. Computer Science : Semester – VI
[Course Code CS 366 : Credits – 2]
CBCS Pattern
As Per New Syllabus
Price ` 300.00
N5951
COMPILER CONSTRUCTION ISBN 978-93-5451-309-1
First Edition : January 2022
© : Author
The text of this publication, or any part thereof, should not be reproduced or transmitted in any form or stored in any
computer storage system or device for distribution including photocopy, recording, taping or information retrieval system or
reproduced on any disc, tape, perforated media or other information storage device etc., without the written permission of Author
with whom the rights are reserved. Breach of this condition is liable for legal action.
Every effort has been made to avoid errors or omissions in this publication. In spite of this, errors may have crept in. Any
mistake, error or discrepancy so noted and shall be brought to our notice shall be taken care of in the next edition. It is notified
that neither the publisher nor the author or seller shall be responsible for any damage or loss of action to any one, of any kind, in
any manner, there from. The reader must cross check all the facts and contents with original Government notification or
publications.
DISTRIBUTION CENTRES
PUNE
Nirali Prakashan Nirali Prakashan
(For orders outside Pune) (For orders within Pune)
S. No. 28/27, Dhayari Narhe Road, Near Asian College 119, Budhwar Peth, Jogeshwari Mandir Lane
Pune 411041, Maharashtra Pune 411002, Maharashtra
Tel : (020) 24690204; Mobile : 9657703143 Tel : (020) 2445 2044; Mobile : 9657703145
Email : [email protected] Email : [email protected]
MUMBAI
Nirali Prakashan
Rasdhara Co-op. Hsg. Society Ltd., 'D' Wing Ground Floor, 385 S.V.P. Road
Girgaum, Mumbai 400004, Maharashtra
Mobile : 7045821020, Tel : (022) 2385 6339 / 2386 9976
Email : [email protected]
DISTRIBUTION BRANCHES
DELHI BENGALURU NAGPUR
Nirali Prakashan Nirali Prakashan Nirali Prakashan
Room No. 2 Ground Floor Maitri Ground Floor, Jaya Apartments, Above Maratha Mandir, Shop No. 3,
4575/15 Omkar Tower, Agarwal Road No. 99, 6th Cross, 6th Main, First Floor, Rani Jhanshi Square,
Darya Ganj, New Delhi 110002 Malleswaram, Bengaluru 560003 Sitabuldi Nagpur 440012 (MAH)
Mobile : 9555778814/9818561840 Karnataka; Mob : 9686821074 Tel : (0712) 254 7129
Email : [email protected] Email : [email protected] Email : [email protected]
[email protected] | www.pragationline.com
The book has its own unique features. It brings out the subject in a very simple and lucid
manner for easy and comprehensive understanding of the basic concepts. The book covers
theory of Introduction to Compilers, Lexical Analysis (Scanner), Syntax Analysis (Parser)
Syntax Directed Definition, Code Generation and Optimization.
A special word of thank to Shri. Dineshbhai Furia, and Mr. Jignesh Furia for
showing full faith in me to write this text book. I also thank to Mr. Amar Salunkhe and
Mr. Rahul Thorat of M/s Nirali Prakashan for their excellent co-operation.
I also thank Mrs. Yojana Despande, Mr. Ravindra Walodare, Mr. Sachin Shinde, Mr. Ashok
Bodke, Mr. Moshin Sayyed and Mr. Nitin Thorat.
Although every care has been taken to check mistakes and misprints, any errors,
omission and suggestions from teachers and students for the improvement of this text book
shall be most welcome.
Author
Syllabus …
1. Introduction (4 Lectures)
• Definition of Compiler, Aspects of Compilation
• The Structure of Compiler
• Phases of Compiler:
o Lexical Analysis
o Syntax Analysis
o Semantic Analysis
o Intermediate Code Generation
o Code Optimization
o Code Generation
• Error Handling
• Introduction to One Pass and Multipass Compilers, Cross Compiler, Bootstrapping
2. Lexical Analysis (Scanner) (4 Lectures)
• Review of Finite Automata as a Lexical Analyzer, Applications of Regular Expressions and
Finite Automata (Lexical Analyzer, Searching using RE), Input Buffering, Recognition of
Tokens
• LEX: A Lexical Analyzer Generator (Simple Lex Program)
3. Syntax Analysis (Parser) (14 Lectures)
• Definition, Types of Parsers
• Top-Down Parser:
o Top-Down Parsing with Backtracking: Method and Problems
o Drawbacks of Top-Down Parsing with Backtracking
o Elimination of Left Recursion (Direct and Indirect)
o Need for Left Factoring and Examples
• Recursive Descent Parsing:
o Definition
o Implementation of Recursive Descent Parser Using Recursive Procedures
• Predictive [LL(1)] Parser (Definition, Model)
o Implementation of Predictive Parser [LL(1)]
o FIRST and FOLLOW
o Construction of LL(1) Parsing Table
o Parsing of a String using LL(1) Table
• Bottom - Up Parsers
• Operator Precedence Parser - Basic Concepts
• Operator Precedence Relations form Associativity and Precedence
o Operator Precedence Grammar
o Algorithm for LEADING and TRAILING (with Examples)
o Algorithm for Operator Precedence Parsing (with Examples)
o Precedence Functions
• Shift Reduce Parser:
o Reduction, Handle, Handle Pruning
o Stack Implementation of Shift Reduce Parser (with Examples)
• LR Parser:
o Model,
o Types [SLR (1), Canonical LR, LALR] - Method and Examples
• YACC:
o Program Sections
o Simple YACC Program for Expression Evaluation
4. System Directed Definition (7 Lectures)
• Syntax Directed Definitions (SDD)
• Inherited and Synthesized Attributes
• Evaluating an SDD at the Nodes of a Parse Tree, Example
• Evaluation Orders for SDD’s
• Dependency Graph
• Ordering the Evaluation of Attributes
• S - Attributed Definition
• L - Attributed Definition
• Application of SDT
• Construction of Syntax Trees
• The Structure of a Type
• Translation Schemes:
o Definition
o Postfix Translation Scheme
5. Code Generation and Optimization (7 Lectures)
• Compilation of Expression:
o Concepts of Operand Descriptors and Register Descriptors with Example.
o Intermediate Code for Expressions - Postfix Notations, Triples, Quadruples and
Expression Trees
• Code Optimization:
o Optimizing Transformations: Compile Time Evaluation, Elimination of Common Sub
Expressions, Dead Code Elimination, Frequency Reduction, Strength Reduction
• Three Address Code
• DAG for Three Address Code
• The Value - Number Method for Constructing DAG’s
• Definition of Basic Block, Basic Blocks and Flow Graphs
• Directed Acyclic Graph (DAG) representation of Basic Block
• Issues in Design of Code Generator
Contents …
CHAPTER
1
Introduction
Objectives…
To study the Concept of Compiler
To understand Structure and Phases of Compiler
1.0 INTRODUCTION
• Complier construction is truly an engineering science. With the science, we can
methodically almost routinely design and implement fast, reliable, and power
compliers.
• The name ‘compiler’ is primarily used for programs that translate source code from
a high-level programming language to a lower level language (e.g. assembly
language, object code or machine code) to create an executable program.
• A compiler is a translator which translates a program (the source program) written in
one language to an equivalent program (the target program) written in another
language (see Fig. 1.1).
• We call the languages in which the source and target programs are written the source
and target languages, respectively.
• Typically, the source language is a high-level language in which humans can program
comfortably (such as Java or C++), whereas the target language is the language the
computer hardware can directly handle (machine language) or a symbolic form of it
(assembly language).
Fig. 1.1:
1.1: Compiler
1.1
Compiler Construction Introduction
Preprocessor
Source program
Compiler
Assembler
Assembly program
Loder/Linker
Fig. 1.2
1.2: A Language Processing System
• Fig. 1.3 shows compiler concept. The user can process the input and produced the
output if the target program is executable machine language program as shown
in Fig. 1.4.
Error Warnings
Messages
Fig. 1.3
1.3: Concept of Compiler
Fig.
Fig. 1.4
1.4: Running the Target
Target Code
• Compilers on UNIX platform convert HLL to LLL whereas compilers on DOS platform
usually convert HLL to MLL (Machine Level Language).
• The process of compilation is much more complicated than assembler due to various
features supported by HLL which are:
1. HLL Programs
rograms:
grams: These programs are machine independent i.e. programs are
portable whereas LLL programs are machine dependent.
2. Data Types:
ypes: HLL provides various data types float, double, string, etc. It is task of
compiler to convert these data type into basic data types supported by machine e.g.
byte, word, etc.
3. Data Structures: HLL provides various data structure like arrays, records, files, etc.
These data structures should be mapped to data structure supported by machine.
4. Control Structures: HLL provides various control structures like for, while, repeat-
until, etc.
5. Scope Rules: Block structured language (like C, Pascal) allows nested definition of
functions. HLL required additional data structure to store the information about
variables.
6. HLL provides runtime support e.g., recursion and dynamic allocation.
allocation
Advantages of Compiler:
Compiler:
• Source code is not included by compiler, therefore compiled code is more secure than
interpreted code.
• Compiler produces an executable file and therefore the program can be run without
need of the source code.
1.3
Compiler Construction Introduction
• The object program can be used whenever required without the need to of
recompilation.
Disadvantages:
• When an error is found, the whole program (source code) has to be re-compiled again
and again.
• Object code needs to be produced before a final executable file, this can be a slow
process.
Fig. 1.5
1.5: Concept of Interpreter
• Interpreter execute source program statement by statement. The interpreter is a
translator which takes source program as the input and produced output in MLL.
• An interpreter translates only one statement of the program (source code) at a time. It
reads only one statement of program at a time, translates it and executes it.
• Then it reads the next statement of the program again translates it and executes it. In
this way it proceeds further till all the statements are translated and executed
successfully.
Advantages of Interpreter:
Interpreter:
1. If an error is found then there is no need to retranslate the whole program like
compiler.
2. Debugging (check errors) is easier since the interpreter stops when it finds an
error.
3. Easier to create multi-platform (run on different operating system) code, as each
different platform would have an interpreter to run the same source code.
1.4
Compiler Construction Introduction
Disadvantages of Interpreter:
Interpreter:
1. Source code is required for the program to be executed and this source code can be
read by any other programmer so it is not a secured.
2. Interpreters are generally slower than compiled programs because interpreter
translates one line at a time.
• A compiler bridges the semantic gap between a programming language domain and
an execution domain.
• The following are the two aspects of compilation:
1. Compiler generated code which implements meaning of a source program in the
execution domain.
2. Compilation process diagnosis the wrong semantics of source programming
language (PL) or source program.
• To implement these aspects, we discuss the following programming language features:
1. Data type,
2. Data structures,
3. Scope rules,
4. Control structure.
For example,,
int i, j
declaration in 'C'
float x, y
y = 10;
x = y + i;
• In first statement, since y is float type, the compiler generates code to convert value '10'
to the floating point representation.
• In second statement, the addition cannot perform straight way, since y and i types are
different. Compiler must generate the code to convert i into float and then addition is
perform as floating point operation.
• Checking the legality of each operation and determined the need for type conversion
operations, the compiler must generate type specific code to implement the operation.
• The analysis of the input source program and the synthesis of the executable target
code as shown in Fig. 1.6.
Source
Analysis of source program
Program
Code for
Synthesis of target machine program Target Machine
Analysis Synthesis
Fig. 1.7:
1.7: Compilation
ompilation Process of Compiler/Structure of Compiler
• The Analysis phase generates an intermediate representation of the source program
and symbol table, which should be fed to the Synthesis phase as input.
• The analysis part collects information about the source program and stores it in a data
structure called a symbol table, which is passed along with the intermediate
representation to the synthesis part.
• The Analysis part/phase consists of three phases i.e., Lexical Analysis, Syntax Analysis
and Semantic Analysis.
1.8
Compiler Construction Introduction
• The Synthesis part known as the back-end of the compiler, which generates the target
program with the help of intermediate source code representation and symbol table.
• The synthesis phase consists of two phases i.e., Code Optimization and Code
Generation .
• A compiler can have many phases and passes. A pass refers to the traversal of a
compiler through the entire program.
• A phase of a compiler is a distinguishable stage/step, which takes input from the
previous stage, processes and yields output that can be used as input for the next stage.
A pass can have more than one phase.
Lexical
analysis
Tokens
Syntax
analysis
Parse tree
Semantic
analysis
1.10
Compiler Construction Introduction
2. Error Handling
o Each phase of compiler can encounter errors. After detecting an error, a phase
must handle the error so that compilation can proceed.
o Error handler is invoked whenever any fault occurs in the compilation process of
source program.
• Let us see the phases of compiler in detail.
Fig. 1.9
1.9: Use of Scanner
• Table 1.1 shows token, toke type and its value.
Table 1.1
Sr. No. Token Token type Value
1. Max, a, b Identifier Pointer to symbol table
2. 20, 56 Number (constant) Value of number
3. for, else Reserve word Number is given to each reserve word
4. +, *, > Operator ASCII value of operator
5. ;, $, #, · Special character ASCII value of character
1.12
Compiler Construction Introduction
Letter/digit
Start Letter Delimiter
0 1 2
Fig. 1.10
1.10:
10: DFA
• In this DFA, state 0 is the start state and state 2 is final state indicated with double
circle. If character other than letter or digit then state transition is from state (1) to
state (2) and state (2) recognize the token is identifier.
• Thus, looking at final state we can tell which token type it is. This DFA is converted
into STT which is as follows:
Inputs
State Letter Digit Other
0 1
1 1 1 2 (Accept)
2
• All blank entire’s indicates error state. This type of transition table can be easily put
into a program and a driver can be drive further.
• Utility like Lex (a program that generates lexical analyzer) in UNIX take input as
regular expression and generates a transition table for a given regular expression.
• Also a driver is present to use this table. Writing complex on UNIX is comparatively
easy.
• It basically involves grouping of the statements into grammatical phrases that are used
by the compiler to generate the output finally. The grammatical phrases of the source
program are usually represented by a parse tree.
• Parser is a program usually part of complier, converts the tokens produced by lexical
analyzer into a tree like representation called parse tree.
• A parse tree is a structural representation of the input being parsed. A parse tree is a
graphical representation a derivation.
• A derivation basically a sequence of production rules, in order to get the input string.
Derivation is used to find whether the string belongs to a given grammar.
• A syntax analyzer takes the input from a lexical analyzer in the form of token streams.
It analyses the source code (token stream) against the production rules to detect any
errors in the code. The output of this phase is a parse tree.
• For example, a = b + i can be represented in syntax tree form as shown in Fig. 1.11.
=
a +
b i
Fig. 1.1
1.11: Syntax Tree/Parse
ree/Parse Tree
a +
Real
b i
Real Integer
Fig. 1.12
1.12: Syntax Tree of Statement a = b + i
a + + =
Real
b i b i a T
Real Integer Real Real Real Real
Fig. 1.13
1.13
• Semantic analyzer (Mapper) converts or maps syntax trees for each construct into a
sequence of intermediate language statements. Some compilers have the ability to do
such conversion automatically.
• One of the intermediate code which is used in many compilers is three address code.
This code has almost three operands and it consists of sequence of instructions.
For example, a := b + c
• Three address code is,
tempi := b + c
temp2 := temp 1
a := temp 2
Intermediate
representation (IR)
Fig. 1.14:
1.14: Optimization in Compiler
Advantages of Code Optimization:
ptimization:
1. The optimized program occupied 25 per cent less storage and execute three times
faster than un-optimized program.
2. Reduces cost of execution.
Disadvantage of Code Optimization:
1. The 40% extra compilation time is needed.
1.18
Compiler Construction Introduction
• The following example shows how the statement is translated by a compiler. Consider
statement, a := b − c * 40
• The translation of this statement is shown in the Fig. 1.15.
a : = b - c* 40
Lexical analyzer
Syntax analyzer
:=
Id1 -
Id 2 *
Id3 40
Semantic analyzer
:=
Id1 -
Id 2 *
Id23 Int to real
40
Intermediate code
generator
Code optimizer
Code generator
MOV Id3, r1
MULT 40.0, r1
MOV Id2, r2
SUB r1, r2
MOV r2, Id1
Fig. 1.1
1.15: Translation of a Statement by Compiler
1.21
Compiler Construction Introduction
• A single-pass compiler is a compiler that passes through the source code of each
compilation unit only once.
• In other words, a single/one-pass compiler does not "look back” at code it has
previously processed.
• A one-pass compiler is a compiler that passes through the parts of each compilation
unit only once, immediately translating each part into its final machine code.
1.22
Compiler Construction Introduction
• The one-pass compiler converts the program into one or more intermediate
representations in steps between source code and machine code and which
reprocesses the entire compilation unit in each sequential pass.
• This refers to the logical functioning of the compiler, not to the actual reading of the
source file once only.
• For instance, the source file could be read once into temporary storage but that copy
could then be scanned many times.
• One-pass compilers are smaller and faster than multi-pass compilers. One-pass
compilers are easy to implement and suffers from high storage requirements.
• Entire program is kept into memory because one phase may need information in a
different order than a previous phase produces it.
• The internal form of a program may be considerably longer than either the source
program or the target program. Back-patching is used to store the symbol's addresses.
• One-pass compiler is used to traverse the program only once. A single pass compiler is
also called a narrow compiler.
• The one-pass compiler passes only once through the parts of each compilation unit. It
translates each part into its final machine code.
• Fig. 1.16 shows one-pass compiler.
Fig.1.16
Fig.1.16: One-
One-pass Compiler
Compiler
Advantages:
1. A single-pass compiler is faster than the multi-pass complier.
2. One-pass compiler uses few passes for compilation.
3. Compilation process in single-pass compiler is less time consuming than multi-pass
compiler.
1.23
Compiler Construction Introduction
Disadvantages:
1. A single-pass compiler takes more space than the multi-pass compiler.
2. In a single-pass compiler, the complicated optimizations required for high quality
code generation are not possible.
3. To count the exact number of passes for an optimizing compiler is a difficult task.
4. One-pass compilers are unable to generate as efficient programs, due to the limited
scope of available information.
5. Some programming languages simply cannot be compiled in a single pass, as a
result of their design.
• A two-pass compiler is an compiler which goes through assembly language twice and
generate object code.
• First pass is called as Pass-I. It performs tasks like Lexical analysis, Syntax analysis,
Semantic analysis and intermediate code generation.
• Second pass is called as Pass-II. It performs tasks like Storage allocation, Code
optimization and Code generation.
• It suffers from high storage requirements, little less than one-pass.
1. Backpatching is not required.
2. Execution time required is more than one-pass.
• Fig. 1.17 shows two-pass compiler.
Grammatical
rules with
data structure
Variable Constant/Literal
table table
Fig. 1.17
1.17: Two-
wo-pass Compiler
• First pass, Second pass, Third pass is called as Pass-I, Pass-II and Pass-III respectfully.
As given below:
o In the Pass I, compiler can read the source program, scan it, extract the tokens and
store the result in an output file.
o In the Pass II, compiler can read the output file produced by first pass, build the
syntactic tree and perform the syntactical analysis. The output of this phase is a
file that contains the syntactical tree.
o In the Pass III, compiler can read the output file produced by second pass and
check that the tree follows the rules of language or not. The output of semantic
analysis phase is the annotated tree syntax.
• Above pass is going on, until the target output is produced.
• A multi-pass compiler is called a wide compiler. Fig. 1.18 shows multi-pass compiler.
Grammatical
rules with Error routine
data structure table
Source
code
Pass n
Variable, constant
symbol and literal Object
code
Fig. 1.18
1.18: Multi-
Multi-pass Compiler
Advantages of Multi-
Multi-pass Compiler:
Compiler:
1. A multi-pass compiler requires lesser memory space than single-pass compiler.
2. The wider scope thus available to these compliers allows better code generation.
Disadvantages of Multi-
Multi-pass Compiler:
Compiler:
1. In multi-pass compiler each pass reads and writes an intermediate file, which
makes the complication process time consuming.
2. The multi-pass compilers are slower than single-pass compiler.
3. The time required for compilation increases with the increase in the number of
passes in a complier.
1.25
Compiler Construction Introduction
Implementation
I
Language
Fig. 1.19:
1.19: T Diagram Representation in Cross Compiler
• A cross compiler is a type of compiler that can create executable code for different
machines other than the machine it runs on.
• A cross compiler can create an executable code for a platform other than the one on
which the compiler is running.
• For example, a compiler that runs on Windows platform also generates a code that
runs on Linux platform is a cross compiler.
S T
Fig. 1.20
1.27
Compiler Construction Introduction
S → CL → O
Fig. 1.2
1.21 (a)
C
C language Object
Compiler
Fig. 1.2
1.21 (b)
(b): Bootstrapping
Bootstrapping Process
• These three may be the different languages or source program and compiler language
can be same.
• Suppose compiler for new language L can be written in same language L. For example,
suppose we want a compiler for new language L, which is available on different
machines, say M and N. Consider a language L will work on machine M.
• The source language for compiler is L, target language is M. We want to develop
LM
compiler CM .
SM
• So first we will write a small compiler CM for machine M, which translates a subset
SM
S of language L into the machine or assembly code of M. As we have CM , we can
LM
easily write a compiler CM as shown in Fig. 1.22.
LM SM LM
CS → CM → CM
Fig. 1.2
1.22: Bootstrapping a Compiler
1.28
Compiler Construction Introduction
• Now, consider machine N and we want to develop another compiler for L to run on N.
LM
Source language is S and target language is N. So we want to convert CS into
LN LN LN
compiler CL to implement full language L and using CL we produce CN which is
compiler for L on N.
• This process is shown in Fig. 1.23.
LN LM LN
CL → CM → CM
LN LN LN
CL → CM → CN
Fig. 1.23
1.23: Bootstrapping a Compiler to another Machine
• Bootstrapping is an important concept for building a new compiler. Suppose we want
to create a cross compiler for the new source language S that generates a target code in
language T and the implementation language of this compiler is A.
ST
• We can represent this compiler as, CA (see Fig. 1.24 (a)). Further, suppose we already
have a compiler written for language A with both target and implementation language
as M.
AM ST
• This compiler can be represented as, CA (see Fig. 1.24 (b)). Now, if we run CA with
AM ST
the help of CM , then we get a compiler CM (see Fig 1.24 (c)). This compiler compiles a
source program written in language S and generates the target code in T, which runs
on machine M i.e., the implementation language for this compiler is M.
S T A M
A M
ST AM
(a) Compiler CA (b) Compiler CM
S T S T
A M
A M
ST
(c) Complier CM
Fig. 1.24
1.24: Bootstrapping
1.29
Compiler Construction Introduction
Example:
• In the Fig. 1.25 there are three T diagrams.
o First T diagram contains the P-compiler written in PASCAL. It converts COBOL
language into object code.
o Second T diagram contain the machine language compiler which converts P-code
into machine language.
o Third T diagram contains machine language compiler which converts PASCAL
program into object code.
COBOL CM Object PASCAL CN Object
Machine
Fig. 1.25
1.25: Third T diagram
Advantages of Bootstrapping
ootstrapping:
ping:
1. In Bootstrapping a compiler has to be written in the language it compiles.
2. Using bootstrapping techniques, an optimizing compiler can optimize itself.
3. In Bootstrapping compiler developers only need to know the language being
complied.
PRACTICE QUESTIONS
Q. I Multiple Choice Questions:
Questions:
1. Which is a translator (translates source code to object/target code)?
(a) Compiler (b) Assembler
(c) Interpreter (d) None of the mentioned
2. Which is a computer program that directly executes instructions written in
a programming without requiring them previously to have been compiled?
(a) Compiler (b) Assembler
(c) Interpreter (d) None of the mentioned
3. Which is the process translation of source code into target code by a compiler?
(a) interpretation (b) compilation
(c) Both (a) and (b) (d) None of the mentioned
4. Aspects of compilation includes,
(a) Data structures (b) scope rules
(c) Data types and Control structures (d) All of the mentioned
5. The structure of a compiler is composed of mapping of two parts namely,
(a) analysis (front-end of the compiler) (b) synthesis ( back-end of the compiler)
(c) Both (a) and (b) (d) None of the mentioned
1.30
Compiler Construction Introduction
6. Which refers to the traversal of a compiler through the entire source program?
(a) token (b) lexeme
(c) pass (d) None of the mentioned
7. Which is the final phase of compiler?
(a) lexical analysis (b) syntax analysis
(c) code optimization (d) code generation
8. Lexical analysis also knows as,
(a) parsing (b) scanning
(c) Interpreter (d) None of the mentioned
9. Syntax analysis also knows as,
(a) parsing (b) scanning
(c) Interpreter (d) None of the mentioned
10. Compiler translates the source code to,
(a) executable code (b) machine code
(c) Both (a) and (b) (d) None of the mentioned
11. Compiler should report the presence of _______ in the source program, in
translation process.
(a) classes (b) errors
(c) objects (d) None of the mentioned
12. Which data structure created and maintained by compilers in order to store
information about the occurrence of various entities such as variable names,
function names, objects, classes, interfaces, etc.?
(a) Compiler table (b) Symbol table
(c) Compilation table (d) None of the mentioned
13. The lexical analysis scans the input program character by character and groups
the character into the lexical units called,
(a) tokens (b) scanners
(c) parsers (d) None of the mentioned
14. How many numbers of tokens in the statement printf(“k= %d”, k);
(a) 11 (b) 10
(c) 4 (d) 6
15. A process, a string of tokens can be generated by,
(a) parsing (b) scanning
(c) analyzing (d) translating
16. Following which is not a phase of compiler.
(a) syntax (b) testing
(c) lexical (d) semantic
1.31
Compiler Construction Introduction
3. The analysis (front-end) part of compiler structure, of the compiler reads the
source program, divides it into core parts and then checks for lexical, grammar
and syntax errors.
4. After lexical analysis phase the compiler generates an intermediate code of the
source code for the target machine.
5. In code generation phase, the code generator takes the optimized representation of
the intermediate code and maps it to the target machine language.
6. Bootstrapping is the technique for producing a self-compiling compiler i.e.,
a compiler written in the source programming language that it intends to compile.
7. Each individual unique step in compilation process is called as pass.
8. An interpreter is a computer program that is used to directly execute program
instructions written using one of the many high-level programming languages.
9. The lexical analyzer breaks these syntaxes into a series of tokens, by removing any
whitespace or comments in the source code.
10. In programming language, keywords, constants, identifiers, strings, numbers,
operators and punctuations symbols can be considered as tokens.
11. Syntax analysis is the second phase of compiler which is also called as scanning.
12. Multi-pass compiler scans the input source once and makes the first modified
form, then scans the first-produced form and produces a second modified form,
etc., until the object form is produced.
13. The structure of a compiler is composed of mapping of analysis and synthesis
parts.
14. Code generation checks whether the parse tree constructed follows the rules of
language.
Answers
1. (T) 2. (T) 3. (T) 4. (F) 5. (T) 6. (T) 7. (F) 8. (T) 9. (T) 10. (T)
11. (T) 12. (F) 13. (T) 14. (F)
Q. IV Answer the following Questions:
(A) Short Answer Questions:
Questions:
1. Define compiler.
2. Define interpreter.
3. List phases of compiler.
4. Define compilation.
5. Enlist types of compilers.
6. Compare compiler and interpreter. Ant two points.
7. Define one-pass compiler.
8. What is multi-pass compiler?
9. Define the term pass.
1.34
Compiler Construction Introduction
10. “Symbol table is not required in all phases of compilers”. State true or false
11. What is scanner?
12. What is the need of code optimization?
13. What is phase in compiler?
14. Define bootstrapping.
15. ‘Compiler is a translator”. State true or false.
16. What is the purpose of lexical analysis phase in compiler?
17. Define cross compiler.
18. Which phase in compiler is used for parsing?
19. What is the purpose of code optimization?
20. Define error in compiler.
21. List advantages of interpreter.
(B) Long Answer Questions:
1. What is compiler? Explain with diagram. Also state its advantages and
disadvantages.
2. What is interpreter? How it works? How it is differ from compiler?
3. Write a short note on: Aspect of compilation.
4. Describe data types and data structures in compiler.
5. With the help of example describe scope rules.
6. Explain structure of compiler diagrammatically.
7. With the help of diagram describe phase of compiler.
8. What is lexical analysis? Explain in detail.
9. What semantic analysis? Describe in detail.
10. What is symbol table? Hot manage it? Describe in detail.
11. What is multi-pass compiler? Explain diagrammatically with its advantages and
disadvantages.
12. How the FA in used in scanner? Explain in detail.
13. What is parsing? How to use it in compiler?
14. What is parse tree? Explain with example.
15. Explain the term intermediate code generation in detail.
16. What is code optimization? Explain with diagram.
17. What is error? How to handle it in compiler? Explain with example.
18. What is one-pass compiler? Explain diagrammatically with its advantages and
disadvantages.
19. What is cross compiler? How to represent it? Explain in detail.
20. What is bootstrapping? How to use it? Describe with example.
21. Differentiate between one-pass, multi-pass and cross compilers.
1.35
Compiler Construction Introduction
1. State True or False, “Target code is generated in the analysis phase of the
compiler”. [1 M]
Ans. Refer to Section 1.3.
2. Differentiate between one-pass compiler and two-pass compiler. [5 M]
Ans. Refer to Sections 1.4.1 and 1.4.2.
April 2018
1. What is a cross compiler? [1 M]
Ans. Refer to Section 1.5.
October 2018
1. List the phases of compiler. [1 M]
Ans. Refer to Section 1.3.
2. Define the term bootstrapping. [1 M]
Ans. Refer to Section 1.6.
April 2019
1. Define cross compiler. [1 M]
Ans. Refer to Section 1.5.
1.36
CHAPTER
2
2.0 INTRODUCTION
• Lexical analysis is the process of converting a sequence of characters from source
program into a sequence of tokens.
• Lexical analysis is the act of breaking down source text into a set of words called
tokens. Each token is found by matching sequential characters to patterns.
• A program which performs lexical analysis is termed as a lexical analyzer (lexer),
tokenizer or scanner.
• The scanner or Lexical Analyzer (LA) performs the task of reading a source text as a
file of characters and dividing them up into tokens.
• All the tokens are defined with regular grammar and the lexical analyzer identities
strings as tokens and sends them to syntax analyzer for parsing.
• A typical lexical analyzer or scanner is shown in Fig. 2.1.
Lexical Speci!cation
(regular expressions)
Fig. 2.2
2.2: Working of Scanner
• The scanner scans the input program character by character and groups the character
into the lexical units called tokens.
• Lexical analysis, lexing or tokenization is the process of converting a sequence of
characters (such as in a computer program or web page) into a sequence of tokens
(strings with an assigned and thus identified meaning).
• A program that performs lexical analysis may be termed a lexer, tokenizer or scanner
although scanner is also a term for the first stage of a lexer.
2.4
Compiler Construction Lexical Analysis (Scanner)
• In a case of large source program, significant amount of time is required to process the
characters during the compilation. To reduce the amount of overhead needed to
process a single character from input character stream, specialized buffering
techniques have been developed.
• The lexical analyzer scans the characters of the source program one at a time to
discover tokens; however, many characters beyond the next token may have to be
examined before the next token itself can be determined.
• For this reason two pointers are used, one pointer to mark the beginning of the token
being discovered, and the other, a look ahead pointer, to scan ahead of the beginning
point, until the token is discovered.
• Fig. 2.4 shows the how to scan the input tokens using look ahead pointer.
lexemeBegin forward
pointer pointer
Fig. 2.5:
2.5: Example of Input Buffer
• A buffer can be divided into two halves. If the look Ahead pointer moves towards
halfway in First Half, the second half is filled with new characters to be read.
• If the look Ahead pointer moves towards the right end of the buffer of the second half,
the first half will be filled with new characters, and it goes on.
• Fig. 2.6 shows the buffer divided into two halves of n-characters, where n is number of
characters on one disk block e.g., 1024.
Fig. 2.6
2.6: Input Buffer with Two Halves
2.6
Compiler Construction Lexical Analysis (Scanner)
• The string of the characters between the two pointers is the current token. Initially
both pointers points to the first character of the next token to be found.
• Once, the token is found, a look ahead pointer is set to the characters at its right end
and beginning pointer is set to the first character of the next token. White spaces
(blanks, tab, newline) are not tokens.
• If the look ahead pointer moves beyond the buffer halfway mark, then other half is
filled with the next characters from the source file.
• Since, the look ahead pointer move from left half and then again to right half, it is
possibility that we may loose characters that not yet been grouped into tokens.
• Every time when left half is full, right half is loaded. If forward pointer move at the
end of right half again left half is loaded.
• We can make buffer larger if we choose another buffering scheme. We use sentinels
buffering so that characters are not loose when we move left half to right half and vice
versa.
• The code of input buffering is as follows:
if lookahead pointer is at end of left half
then
load the right half
and increment lookahead pointer
else
if lookahead pointer is at end of right half
then
load the left half and move lookahead pointer to the beginning of left half
else
increment lookahead pointer
• These buffering techniques, makes the reading process easy and also reduces the
amount of overhead required to process.
• In sentinels we use special character that is not the part of source program. This
character is at the end of each half. So every time look ahead pointer checks this
character and then the other half is loaded.
• The sentinel is a special character that cannot be part of the source program, and a
natural choice is the character eof.
• The advantage is that we will not loose characters for which token is not yet formed,
while moving from one half to other half.
2.7
Compiler Construction Lexical Analysis (Scanner)
• Normally, eof is a special character used. But here additional test is required to check
whether lookahead pointer points to an eof.
• The Fig. 2.7 shows the sentinels buffering scheme.
Fig. 2.7
2.7: Buffering with Sentinels
• In order to optimize the number of tests to one for each advance of forward pointer,
sentinels are used with buffer, (see Fig. 2.8).
• The idea is to extend each buffer half to hold a sentinel at the end. The ‘eof’ is usually
preferred as it will also indicate end of source program.
• The sentinel is a special character eof (end of file) that cannot be a part of source
program.
Lexeme Forward
beginning pointer
Fig. 2.8
2.8: Example of Sentinels
• Transition diagram show the actions that take place when a lexical analyzer is called
by the parser to get the next token.
• We use a transition diagram to keep track of information about characters that are
seen as the forward pointer scans the input.
• The design of lexical analyzers depends more or less on automata theory. In fact, a
lexical analyzer is a finite automaton design.
• Lexical analysis is a part of compiler and it is designed using finite automaton, which
is known as Lexical analyzer.
• Lexical analyzer is used to recognize the validity of input program, whether the input
program is grammatically constructed or not.
• For example, suppose we wish to design a lexical analyzer for identifiers; an identifier
is defined to be a letter followed by any number of letters or digits, as follows,
identifier = {{letter} {letter, digit}}
• It is easy to see that the DFA (Deterministic Finite Automata) in Fig. 2.9 will accept the
above defined identifier. The corresponding transition table for the DFA is given as
given below:
State/symbol Letter Digit
A B C Initial state : A
B B B Final state : B
C C C
Letter
Letter
B
Digit
A
Digit
Digit
C
Letter
Fig. 2.9
2.9: DFA that Accepts Identifier
Identifier
2.11
Compiler Construction Lexical Analysis (Scanner)
14
17
5. retract:
retract: Used to retract look ahead pointer one character, which token is found,
next character is delimeter and since delimeter is not the part of token, we are
retract procedure.
• To convert transition diagrams into a program, we can write a code for each state. To
obtain the next character from input buffer we use getchar() function. Then finds the
state which has no edge that is after blank or new line (ws).
• It is denoted by double circle, and here the token is found. If all transition diagrams
have been tried without success, then error correction routine is called. We use '*' to
indicate states on which retract is called.
• The pseudo code is as follows:
Pseudo-
Pseudo-code for keyword:
keyword:
state 0 : c = getchar();
if c='i' goto state 1;
else
if c='e' goto state 4;
else
error();
state 1 : c = getchar();
if c='f' goto state 2;
else
error();
state 2 : c = getchar()}
if delimeter(c)
goto state 3;
else
error();
state 3 : retract();
return (if,install());
state 4 : c = getchar();
if c='l' goto state 5
else
error( ;
state 5 : c = getchar();
if c='s' goto state 6
else
error();
2.13
Compiler Construction Lexical Analysis (Scanner)
state 6 : c = getchar();
if c='e' goto state 7
else
error();
state 7 : c = getchar();
if
delimeter(c) goto state 8;
else
error();
state 8 : retract();
return (else,install());
• Similarly we can write pseudo code for identifier as follows :
state 18 : c := getchar();
if letter (c) goto state 19;
else
error();
state 19 : c := getchar();
if letter (c) or digit (c) then goto state 19;
else if delimiter (c) goto state 20;
else
error ();
state 20 : retract()
return(id, install ());
• Hence, after reorganization of tokens the scanner represents by FA and it can be then
implemented by writing code for each state. In UNIX, lex or flex is the utility available
which is used to generate lexical analyzer automatically.
Example 1: Create pseudo-code for identifier of C language.
Solution: Find out regular expression for identifier.
id → l (l | d)*
i.e the pattern for id is
[a– zA – z] [a – zA – z0 – 9]*
We design the transition diagram for identifier as follows:
Fig. 2.12
2.12:
12: Transition Diagram
2.14
Compiler Construction Lexical Analysis (Scanner)
Pseudo-
Pseudo-code:
code:
state 0 : c = getchar();
if letter (c) goto state 1
else
error();
state 1 : c = gertchar();
if letter (c) or digit (c) goto state 1
else
error()
state 2 : retract (c);
return (id, install ());
Example 2: Create psedo-code for number of C language.
Solution: Number is (digit) (digit)*
The finite automata is,
Fig. 2.1
2.13: The Finite Automata
Pseudo-
Pseudo-code:
code:
state 0 : c = getchar()
if digit (c) goto state 1
else
error()
state 1 : c = getchar();
if digit (c) goto state 1
else
error();
state 2 : retract();
return (num, install ());
Example 3: Write pseudo-code for hex-decimal number of C language.
Solution:
Solution: The hex number start with OX or Ox and it contains 0 – 9, A – F, a - f
characters.
The pattern is O [xX] [0 – 9 A – F a – f]*
2.15
Compiler Construction Lexical Analysis (Scanner)
The FA is,
Fig. 2.14
2.14:
14: Finite Automata Diagram
Pseudo-
Pseudo-code :
state 0 : c = getchar();
if c = 'O' goto state 1
else
error();
state 1 : c = getchar();
if c = 'x' or c = 'X' goto state 2
else
error();
state 2 : c = getchar();
if ((c > 0 and c < 9)) or (c > 'A' and c < 'F') goto state 3
else
error();
state 3 : retract();
return (literal, install ());
• Regular expressions play an important role in the study of finite automation and its
application. The oldest applications of regular expressions were in specifying the
component of a compiler called a lexical analyzer.
• Regular expressions are extensively used in the design of lexical analyzer. Regular
expression is used to represent the language (lexeme) of finite automata (lexical
analyzer).
• A regular expression is a compact notation that is used to represent the patterns
corresponding to a token.
• Regular languages, which are defined by regular expressions are used extensively for
matching patterns within text (as in Word processing or Internet searches) and for
lexical analysis in computer language compilers.
• Regular expressions and finite automata are powerful tools for encoding text patterns
and searching for these patterns in textual data.
• For example, hashtags in social media messages and posts (e.g. #OccupyWall Street)
can be found using the notation (^|\s) # ([A-Za-z0-9_]+), where (^|\s) # denotes
2.16
Compiler Construction Lexical Analysis (Scanner)
• These positions correspond to the prefixes of the word, ranging from the empty string
(i.e., nothing of the word has been seen so far) to the complete word.
Start t h e n
t th the then
Fig. 2.15
2.15: A Finite Automation Modeling Recognition of Then
• In Fig. 2.15, the five states are named by the prefix of then seen so far. Inputs
correspond to letters.
• We may imagine that the lexical analyzer examines one character of the program that
it is compiling at a time and the next character to be examined is the input to the
automaton.
• The start state corresponds to the empty string and each state has a transition on the
next letter of then to the state that corresponds to the next-larger prefix.
• The state named then is entered, when the input has spelled the word then. Since it is
the job of this automaton to recognize when then has been seen, we could consider
that state the lone accepting state.
• Lexical analysis can be performed with pattern matching through the use of regular
expressions. Therefore, a lexical analyzer can be defined and represented as a DFA.
• Recognition of tokens implies implementing a regular expression recognizer. This
entails implementation of a DFA.
• Fig. 2.16 (a) shows steps in token recognition.
• The identified token is associated with a pattern which can be further specified using
regular expressions.
• From the regular expression, we construct an DFA. Then the DFA is turned into code
that is used to recognize the token.
Token
Regular
Expression
DFA
Token Recognition
Fig. 2.16
2.16 (a):
(a): Steps in Token Recognition
2.18
Compiler Construction Lexical Analysis (Scanner)
• Fig. 2.16 (b) shows an example of transition diagram. Suppose that we want to build a
lexical analyzer for the recognizing identifier, > =, >, integer const.
• The corresponding DFA that recognizes the tokens is shown in the Fig. 2.16 (b).
letter | digit | _
other
return ‘identi"er’
r|
te digit
let
>
= return ‘op_ge’
Other
return ‘op_gt’
Fig. 2.16
2.16 (b):
(b): DFA that Recognizes Tokens id, integer_const, etc.
2.4 LEX: LEXICAL ANALYZER GENERATOR [Oct. 16, April 17, 18, 19]
19]
• Lex is a computer program that generates lexical analyzers ("scanners" or "lexers").
The purpose of a lex program is to read an input stream and recognize tokens.
• Use a lexical-analyzer generator, such as the Lex compiler, to produce the lexical
analyzer from a regular-expression based specification. In this case, the generator
provides routines for reading and buffering the input.
• Lex is better known as Lexical Analyzer Generator in Unix OS. It is a command that
generates lexical analysis program created from regular expressions and C language
statements contained in specified source files.
• Lexical Analyzer Generator introduce a tool called Lex, which allows one to specify a
lexical analyzer by specifying regular expressions to describe pattern for tokens. Lex
tool itself is a lex compiler.
• A lex compiler or simply lex is a tool for automatically generating a lexical analyzer
for a language. It is an integrated utility of the UNIX operating system. The input
notation for the lex is referred to as the lex language.
• Lex is a program used to construct lexical analyzer for a variety of languages. The
input of lex is in lex language.
• Lex is generally available on UNIX or LINUX. It generates a scanner written in 'C'
language.
• Lex is one program of many that is used for generating lexical analyzers based on
regular expressions.
2.19
Compiler Construction Lexical Analysis (Scanner)
• A lexical analyzer is a program that processes strings and returns tokens that are
recognized within the input string.
• The token identifies the recognized substring and associates attributes with it. Lex uses
regular expressions to specify recognizable tokens of a language.
• The Fig. 2.17 shows the lexical analyzer creation on lex.
Fig. 2.17
2.17: Lexical analyzer on lex
• The source program is written in lex language having extension ·l. Then this program
is run on lex compiler which always generate C program lex·yy·c which we link with
lex library - ll.
• This program is having representation of transition diagram for every regular
expression present in the program. Then len·yy·c is compiled on 'C' compiler which
generate output file a·out.
• The about is used for transforming input stream into a sequence of tokens. The
compilation process on LINUX is as follows :
$ lex sample·l
$ cc lex·yy·c - ll (where – ll is optional)
$ ·\a·out
Lex Program Specification:
pecification: [Oct. 16]
16]
• A lex program consists of three sections and each section is separated by %%.
definition or declaration
%%
translation rules
%%
procedures written in 'C' language
• The three sections of lex program are explained below:
1. The declaration section consists of declaration are header files, variables and
constants. It also contains regular definitions. We surround the C code with special
delimeters "%{" & "%}", Lex copies the data between "%{" & "%}" directly to the 'C'
file.
2.20
Compiler Construction Lexical Analysis (Scanner)
2. The rule section consists of the rules written in regular expression forms with
corresponding action. This action part is written in 'C' language. In other words,
each rule is made up of two parts: a pattern and an action separated by
whitespace. The lexer that lex generates will execute the action when it recognizes
the pattern. These patterns are regular expressions written in UNIX-style. C code
lines anywhere else are copied to an unspecified place in the generated 'C' file. If
the C code is more than one statement then it must be enclosed within braces { }.
When a lex scanner runs, it matches the input against the pattern in the rule
section. When it finds a match then it execute C code associated with that pattern.
3. The procedure section is in C code and it is copied by lex to C file. For large
program, it is better to have supporting code in a separate source file. If we change
lex file then it will not be affected.
• The following table shows the regular expressions used in lex.
Table 2.2
Regular
Matches
Expressions
• (dot) any single character except the newline character "\n".
* zero or more occurrences of expression
+ one or more occurrences of expression
? zero or one occurrences of regular expression. e.g. (– (digit) ?)
\ Any escap character in 'C' language.
[a - z] range of characters from 'a' to 'z'. range is indicated by '-'
(hyphen).
[0 - 9] range of digits from 0 to 9.
$ matches end of line, as it is last character of regular expression.
^ matches beginning of line, as it is first character of regular
expression.
[012] zero or one or two.
| "or" e.g. a | b either a or b.
() regular expressions are grouped (a | b).
Example 4: A lex program to recognize verbs had, have and has.
Solution:
Solution:
/* Lex program to recognize verbs */
% {
#include<stdio·h>
% }
%%
2.21
Compiler Construction Lexical Analysis (Scanner)
• This routine finds the error in the input. All versions of yacc also provide a simple
error reporting routine.
Note:
• When we use a lex scanner and a yacc parser together, the parser is the high level
routine.
• It calls the lexer yylex( ) whenever it needs a token from the input and lex returns a
token value to the parser.
• In this lex-yacc communication, the lexer and parser have to agree what the token
codes are. Hence, using a preprocessor #define we define integer for each token in lex
program.
• For example: #define id 257
#define num 258
yacc can write a C header file containing all of the token definitions in y·tab·h.
• To compile and execute lex program on UNIX :
$ lex programme·l
$ cc lex·yy·c - ll
(ll is lex library)
• We can also compile without - ll option.
$ ·/a·out (to run the program)
Example 5: Find out regular expression in LEX for language containing the strings
starting with and ends with d over {a, d}.
Solution:
Solution: a [ad]* d
Example 6: Find regular expression for hex-decimal number in C language.
+
Solution: [– +] ? 0 [xX] [0 – 9a – fA – F]
Example 7: Find regular expression for language ending with 1 over {0, 1}.
Solution: [0 1] [0 | 1]*1
Example 8: Write a regular expression for floating point number in C language.
+ + +
Solution: [– +] ? [0 – 9] (\ ⋅ [0 – 9] ) ? (E (+ \–) ? [0 – 9] ) ?
Example 9: A lex program to recognize the token of Example 5.
Solution:
/* Lex specification to recognize tokens if, else, <, >, <=, >=, =, <>, id,
num */
% {
#define if 255
#define else 256
2.23
Compiler Construction Lexical Analysis (Scanner)
2.24
Compiler Construction Lexical Analysis (Scanner)
2.25
Compiler Construction Lexical Analysis (Scanner)
2.26
Compiler Construction Lexical Analysis (Scanner)
Output:
$ lex fact ⋅ l
$ cc ⋅ lex ⋅ yy ⋅ c –ll
$ ⋅/a⋅out
Enter number
5
Factorial of number 5 is 120
enter number
4
factorial of number 4 is 24.
Example 13:
13: A lex program to find sum of first n numbers.
Solution:
%{
# include<stdio⋅h>
int i, x, sum = 0;
% }
% %
+
[0 – 9] {
x=atoi (yytext);
for (i=0; i<=x; i++)
{
sum=sum+i;
printf ("%d", i);
}
prinft ("The sum of first %d numbers is %d /n"; x, sum);
return (0);
}
% %
main()
{
printf ("\enter number \n");
yylex();
}
Output:
$ lex sum ⋅ l
2.27
Compiler Construction Lexical Analysis (Scanner)
Example 14:
14: A lex program which find out total words, lines and characters from
input end with $.
Solution:
% {
# include<stdio⋅h>
int c=0, w=0, l=0;
% }
% %
+
[\t] {
w++; c+=strlen (yytext);}
[⋅\n] {
l++, w++}
[$] {
printf ("\n\n\t total characters : % d \n
\t Total words : \t%d\n Total lines : %d
\t \n", c, w, l);
return (0);
}
% %
main()
{
yylex();
}
2.28
Compiler Construction Lexical Analysis (Scanner)
Output :
$ lex cwl ⋅ l
$ cc lex⋅yy⋅c – 0 cwl –ll
$ ⋅/cwl
lexical analysis is the first phase of compiler
parser is the second phase $
Total characters : 72
Total words : 13
Total lines : 2
Example 15:
15: A lex program to find total vowels from the input.
Solution:
%{
# include<stdio⋅h>
int w=0, vc=0;
% }
%%
+
[\t] {w++;]
[a e i o u A E I O U] {vc ++;}
+
[⋅] {
printf ("\n\n Total vowels %d is", vc;}
⋅ ;
% %
main()
{
printf ("enter input \n");
yylex();
}
Output :
Enter input
Compiler is easy course
Total vowels is 9.
Example 16: A lex program to display occurrences of word Computer in text.
Solution:
%{
# include<stdio⋅h>
2.29
Compiler Construction Lexical Analysis (Scanner)
2.30
Compiler Construction Lexical Analysis (Scanner)
PRACTICE QUESTIONS
Q. I Multiple Choice Questions:
1. Which is the process of converting a sequence of characters from source program
into a sequence of tokens?
(a) Lexical analysis (b) Syntax analysis
(c) Semantic analysis (d) None of the mentioned
2. A program which performs lexical analysis is termed as,
(a) tokenizer (b) lexical analyzer (lexer)
(c) scanner (d) All of the mentioned
2.31
Compiler Construction Lexical Analysis (Scanner)
3. Which is a state machine that takes a string of symbols as input and changes its
state accordingly?
(a) Finite automata (b) Finite compilation
(c) Finite translation (d) None of the mentioned
4. Which is a sequence of characters that matches the pattern for a token i.e.,
instance of a token?
(a) String (b) Pattern
(c) Lexeme (d) grammar
5. Which is sequence of characters that can be treated as a unit of information in the
source program?
(a) String (b) Pattern
(c) Lexeme (d) Token
6. Which expressions have the capability to express finite languages by defining a
pattern for finite strings of symbols?
(a) Grammar (b) Regular
(c) Language (d) None of the mentioned
7. A character sequence that cannot be scanned into any valid token is,
(a) a lexical pattern (b) a lexical token
(c) a lexical error (d) None of the mentioned
8. Which is a special character (eof) that cannot be part of the source program?
(a) Sentinel (b) Buffer pair
(c) Token (d) Pattern
9. The role of Lexical analyzer includes,
(a) Reads the source program, scans the input characters, group them into lexemes
and produce the token as output.
(b) Enters the identified token into the symbol table.
(c) Displays error message with its occurrence by specifying the line number.
(d) All of the mentioned
10. The process of forming tokens from an input stream of characters is called as,
(a) Characterization (b) Tokenization
(c) translocation (d) None of the mentioned
11. When expression sum=3+2 is tokenized then what is the token category of 3?
(a) Identifier (b) Integer Literal
(c) Assignment (d) Addition Operator
2.32
Compiler Construction Lexical Analysis (Scanner)
2.33
Compiler Construction Lexical Analysis (Scanner)
4. A pattern explains what can be a token, and these patterns are defined by means
of regular expressions.
5. If the lexical analyzer finds a token valid, it generates an error.
6. Lexical errors can be handled by the actions Deleting one character from the
remaining input, Inserting a missing character into the remaining input and
Replacing a character by another character.
7. The language defined by regular grammar is known as regular language.
8. The yylex() function is used to start or resume scanning.
9. The sentinel is a special character i.e., eof that can be part of the source program.
10. Lexical Analysis can be implemented with the Deterministic finite Automata.
11. The output is a sequence of tokens that is sent to the parser for syntax analysis.
12. A sequence of input characters that comprises a single token is called a lexeme.
13. In programming language, keywords, constants, identifiers, strings, numbers,
operators and punctuations symbols can be considered as lexemes.
14. The purpose of a lex program is to read an input stream and recognize tokens.
15. Lexical analyzer represents each token in terms of regular expression.
Answers
1. (T) 2. (F) 3. (T) 4. (T) 5. (F) 6. (T) 7. (T) 8. (T) 9. (F) 10. (T)
11. (T) 12. (T) 13. (F) 14. (T) 15. (T)
Q. IV Answer the following Questions:
(A) Short Answer Questions:
1. What is the purpose of lexical analysis?
2. Define lexing.
3. Define tokenization.
4. Define regular expression.
5. What finite automata?
6. List any two lex library function.
7. Define token recognition.
8. Define input buffering.
9. “Lex is a compiler”. Comment.
10. What is token?
11. What is the output of Lex program?
12. Lex is a scanner provided by Linux operating system. State true/ false.
13. Define pattern.
14. 'Lexical analyzer keeps the track of line number', state true or false.
2.35
Compiler Construction Lexical Analysis (Scanner)
2.36
Compiler Construction Lexical Analysis (Scanner)
% %
+
[0-9] { n = atoi (yytext);
for (i = 1; i < = n; 1++)
fact = fact * i;
printf ("The factorial is % d", fact);
return (0);
% % }
main()
{
print ("Enter the number");
yylex();
}
4. Write a short note on Input Buffering with the help of a diagram. [3 M]
Ans. Refer to Section 2.1.4.
October 2016
April 2018
1. State True or False. The yywrap() lex library function by default always
return 1. [1 M]
Ans. Refer to Section 2.4.
2. Give the name of the file which is obtained after compilation of the lex program by
the Lex compiler. [1 M]
Ans. Refer to Section 2.4.
3. Write a LEX Program which identifies the tokens like id, if, for and while. [5 M]
Ans. Refer to Section 2.4.
2.38
CHAPTER
3
3.0 INTRODUCTION
• Syntax analysis is the second phase of a compiler. Syntax is the grammatical structure
of a language or program.
• Syntax analysis phase is also known as parsing.
• Parsing is the process of determining whether a string of tokens can be generated by a
grammar.
• Syntax analysis is the process of analyzing a string of symbols, either in natural
language, computer languages or data structures, conforming to the rules of a formal
grammar.
• The errors like missing commas, brackets; invalid variable are reported by compiler in
syntax analysis or parsing phase.
• The input to parsing phase is token from lexical analyzer and output is parse tree.
• In general, syntax analysis means to check the syntax of the input statements with the
help of stream of tokens from lexical analysis and produce parse tree to the semantic
analysis.
• The program which performs syntax analysis is called as syntax analyzer or parser.
• A syntax analyzer or parser takes the input from a lexical analyzer in the form of
token streams.
• The process of constructing a derivation from a specific input sentence is called
parsing.
• The parser analyzes the source code (token stream) against the production rules to
detect any errors in the code and outputs a parse tree.
3.1
Compiler Construction Syntax Analysis (Parser)
3.1 PARSERS
• The program performing syntax analysis is known as parser. The syntax analyzer
(parser) plays an important role in the compiler design.
• The main objective of the parser is to check the input tokens to analyze its
grammatical correctness.
• Parser is one of the components in a complier, which determines whether if a string of
tokens can be generated by a grammar.
• Parser is a program that obtains tokens from lexical analyzer and constructs the parse
tree which is passed to the next phase of compiler for further processing.
• Syntax analysis or parsing is a major component of the front-end of a compiler.
Parsing is the process of determining if a string of tokens can be generated by a
grammar.
• A parser scans an input token stream, from left to right and groups the tokens in order
to check the syntax of the input string by identifying a derivation by using the
production rules of the grammar.
• The syntax analyzer receives valid tokens from the scanner and checks them for
grammar and produces valid syntactical constructs. The syntax analyzer is also called
as parser.
Fig. 3.2
3.2: Parsing
Parsing
• Syntax analysis or parsing means to check the tokens present in source program are
grouped in the syntactically correct format or not. Each language has its own syntax.
To define the syntax of a language, we make the use of grammars.
• Fig. 3.3 shows process of parsing.
• The first stage is the token generation or lexical analysis, by which the input character
stream is split into meaningful symbols defined by a grammar of regular expressions.
• The next stage is parsing or syntactic analysis, which is checking that the tokens form
an allowable expression.
• The final phase is semantic parsing or analysis, which is working out the implications
of the expression just validated and taking the appropriate action.
3.3
Compiler Construction Syntax Analysis (Parser)
Source String
Parser
Lexical Analysis
(Create Tokens)
Tokens
Syntactic Analysis
(Create Tree)
Parse Tree
Compiler, Interpreter
or Translator
Output
Fig. 3.3:
3.3: Parsing Process
• The words that cannot be replaced by anything are called terminals and the words
that must be replaced by other things are called non-terminals.
∴ Non-terminals → string of non-terminals + terminals.
• The grammatical rules are often called productions. To know the syntax of language is
correct or not, grammar is used, called Context Free Grammar (CFG) which is invented
by the Noam Chomsky in 1956. It is also called Type-2 grammar.
• Following terminologies are required in syntax analysis phase:
1. Alp
Alphabet:
habet: A set of characters allowed by the language. The individual members of
these set are called terminals. An alphabet is a finite collection of symbols denoted
by ∑. A symbol is an abstract entity. It cannot be formerly defined as, points in
geometry.
2. String: Group of characters from the alphabet. Any finite sequence of alphabets is
called a string.
3. Production Rules:
Rules They are the rules to be applied for splitting the sentence into
appropriate syntactic form for analysis. Syntax analyzers follow production rules
defined by means of context-free grammar.
These rules are in the following form:
form
Non-terminals → string of terminals and non-terminals.
This form is called BNF (Backus Noun Form) since on left hand side only one non-
terminal symbol is present.
4. Grammar (CFG):
(CFG): CFG is collection of an alphabet of letters called terminals from
which we are going to make the string that will be the words of a language. A CFG
consists of terminals, non-terminals, start symbol and production rules.
Terminal is a token from the alphabet of strings in the language. A set of symbols
are called non-terminals. Start symbol refers to starting of the sentence rules
(mostly its starts with capital letter S).
A finite set of productions in the BNF form:
Grammar: G = (NT, T, P, S)
where, NT → finite set of non-terminals
T → finite set of terminals
P → finite set of production rules and S is a start symbol.
Example of grammar is,
S → E
E → E+E is same as S → E
E → E*E ⇒ E → E + E |E*E| id
E → id
Here, NT = {S, E} T = {+, * id}
Above four production rules and S is start symbol.
3.5
Compiler Construction Syntax Analysis (Parser)
E 2
3 5
E 4 E
+
6 8
E 7 E
*
id
id id
Fig. 3.4
3.4: Parse Tree
3.6
Compiler Construction Syntax Analysis (Parser)
10.
10. Ambiguous Grammar: A grammar G is said to be ambiguous if it has more than
one parse tree (left or right derivation) for at least one string.
Consider a grammar G whose production rules are,
S → E
E → E * E | E + E | id
Consider a sentence id + id * id. Now, if we want to generate parse tree for this
statement starting with S, then we can generate it with two methods.
S S
E E
E + E E * E
E * E E + E
id id
id id id id
Fig. 3.5
3.5: Rightmost and Leftmost Parse Tree
When same input sentence has more than one parse trees then the grammar is
said to be ambiguous.
In general, a CFG is called ambiguous if for at least one word in the language that it
generates, there are two possible derivations of the word that correspond to
different syntax tree.
In the above grammar, ambiguity is present because the operators +, &, * have
been given the same priority.
Ambiguity can be removed by giving precedence to the operators. The above
grammar can be modified as follows:
S → E
E → E+V|V
V → V * T|T
T → id
which is unambiguous grammar.
• In the top-down parsing, we attempt to construct the derivation of the input string or
to build a parse tree starting from the top (root) to the bottom (leaves).
• In the bottom-up parsing, we build the parse tree, starting from the bottom (leaves)
and work towards the top.
• In both cases, input to the parser is scanned from left to right, one symbol at a time.
Fig. 3.6 shows the parsing methods.
Parser
Sentence is a + b * c ⇒ id + id * id
Prediction Sentential Form Used
E ⇒ T+E T+E
T ⇒ V V+E
V ⇒ id id + E
E ⇒ T id + T
T ⇒ V*T id + V * T
V → id id + id * T
T → V id + id * V
V → id id + id * id
Fig. 3.7: Top-
Top-down Parsing using Derivation
• Derivation according to a selected alternative is called as prediction. All possible
sentences derivable from grammar can be according to top-down parse.
• If we have more than one production with same L.H.S. in case of LL (1), then
backtracking is required.
• Backtracking means, if one derivation of a production fails, the syntax analyzer
restarts the process using different rules of same production. This technique may
process the input string more than once to determine the right production.
• Whenever, any derivation is produced during processing, it can be matched with the
input string.
• A successful match implies that further predictions should be made to continue the
parse. An unsuccessful match implies that some previous predictions have gone
wrong.
• At this stage, it is necessary to reject the previous predictions so that the new
predictions can be made.
Example, CFG is,
S → aAb
A → ab|b
Input string ω = abb
S S
a b a b
A A
a b
(a) (b)
Fig. 3.8: Derivation of w using Top-
op-down Parsing
• After applying first unique production on S, we now match second input symbol and
advance input pointer to 'b' of string abb. Now consider b, which is third input symbol,
i.e. advance pointer to 'b'.
3.9
Compiler Construction Syntax Analysis (Parser)
• But derivation is A → ab. Since 'a' does not match, we report failure and go back to A to
try for other alternative. Here, reset input pointer to position 2 and then we obtain.
S
a b
A
b
Fig. 3.9
3.9
• Here, we have parse tree for w and we halt. Using backtracking, parsing is completed
successfully.
r x d r x d r x d r x d
o a o a e a
backtracking next-production
Fig. 3.10
• Backtracking is required in the next example and we shall suggest a way of keeping
track of the input when backtracking takes place.
3.10
Compiler Construction Syntax Analysis (Parser)
c A d c A d c A d
a b a
(a) (b) (c)
Fig. 3.11:
3.11: Steps in top-
top-down parse
The leftmost leaf, labeled c, matches the first symbol of w, so we now advance the
input pointer to a, the second symbol of w and consider the next leaf, labeled A.
We can then expand A using the first alternative for A to obtain the tree of
Fig. 3.11 (b).
We now have a match for the second input symbol so we advance the input pointer to
d, the third input symbol and compare d against the next leaf, labeled b.
Since b does not match d, we report failure and go back to A to see whether there is
another alternative for A that we have not tried but that might produce a match.
In going back to A, we must reset the input pointer to position 2, the position it had
when we first came to A, which means that the procedure for A must store the input
pointer in a local variable.
We now try the second alternative for A to obtain the tree of Fig. 3.11 (c). The leaf a
matches the second symbol of ω and the leaf d matches the third symbol.
Since, we have produced a parse tree for ω, we halt and announce successful
completion of parsing.
• A backtracking parser is a non-deterministic recognizer of the language generated by
the grammar.
• The simplest way of top-down parsing is to use backtracking. A parser takes the
grammar and constructs the parse tree by selecting the production as per the guidance
initiated by left to right scanning of the input string.
• For example, if the input string is s= bcd and the given grammar has productions,
S → bX
X → d | cX
• For the construction of the parse tree for the string bcd, we start with root leveled with
start symbol S.
3.11
Compiler Construction Syntax Analysis (Parser)
• We have only one option for S as bX and also its fist symbol (terminal b) is matched
with the first symbol of string bcd.
• Now the replacement of X must be done in such a way that the second leaf node in the
derivation tree should be 'e'.
• If X is replaced with ‘d’ then we will have to back track because there will be no
matching in second symbol between input string and yield of the parse tree.
• Therefore the non-terminal X is replaced with cX. Finally the non-terminal X will be
replaced with 'd' so that the yield of a parse tree is similar to input string.
• The construction of a parse tree is given in Fig. 3.12.
S S S S
b X b X b X b X
match match match match
c X c X
d match
mismatch match
a
match
(a) (b) (c) (d)
Fig. 3.12
3.12: Steps of Construction of a Parse Tree
Fig. 3.13
3.13
• Elimination of backtracking in top down parsing would have several advantages
parsing would become more efficient and it would be possible to perform semantic
actions and precise error reporting during parsing.
• Backtracking can be avoided by transforming the grammar in such a way that at each
step the choice of production that can lead us to solution can be easily identified.
• In other words, at each step, we can 'predict' which of the productions can lead us to
the complete derivation of the input string, if one exists.
• The idea behind a top-down predictive parser is that the current non-terminal being
processed combined with the next input symbol can guide the parser to take the
correct production rule eventually leading to the match of complete input string.
• The predictive parser is a type of top-down parser that does not require backtracking
in order to derive various input strings.
• This is possible because the grammar for the language is transformed such that
backtracking is not needed.
• What kind of transformations do we make to the grammar rules to suit a predictive
parser? There are two types of transformations done to the grammar in order to suit a
predictive parser. They are Elimination of left recursion and Left factoring.
3.13
Compiler Construction Syntax Analysis (Parser)
A → βA'
A' → αA'/∈
3.14
Compiler Construction Syntax Analysis (Parser)
A → aA'|bCA'
removing left-recursion
A' → BCA'|∈
∴ Grammar becomes (without left-recursion):
A → aA'|bCA'
A' → BCA'|∈
B → AB|b
C → a
Example 2: Eliminate left-recursion from following grammar:
S → Aa | b
A → Ac | Sd | ∈
Solution:
Solution: Here, the non-terminal S is left recursive because:
S ⇒ Aa ⇒ Sda
It is not immediate left-recursion.
To eliminate left-recursion, we substitute S-productions in A-productions and we
obtain,
A → Ac | Aad | bd |∈
Now eliminating immediate left-recursion, we get following A-productions.
A → bdA' | A'
A' → cA' | adA' | ∈
So the grammar becomes:
S → Aa | b
A → bdA' | A'
A' → cA' | adA' |∈
Example 3: Eliminate left-recursion from the following grammar:
S → (L) | a
L → L, S | S
Solution: L-production is having immediate left-recursion. After eliminating left-
recursion we get,
S → (L) |a
L → SL'
L' → , SL' | ∈
• A useful method for manipulating grammars into a form suitable for top-down or
predictive parsing is left-factoring.
• Left factoring is a process of factoring out the common prefixes of alternatives. That is,
left factoring is a grammar transformation or manipulation which is useful for
producing a grammar suitable for recursive-decent parsing or predictive parsing.
• Left factoring is needed to avoiding backtracking problem. To left factor a grammar,
we collect all productions that have the same Left-Hand-Side (LHS) non-terminal and
begin with the same terminal symbols on the Right-Hand-Side (RHS).
• We combine the common strings into a single production and then append a new non-
terminal symbol to the end of this new production.
• Finally, we create a new set of productions using this new non-terminal for each of the
suffixes to the common production.
Definition:
Definition:
• If A → αβ | αγ are two A-productions (where α is non-empty string) then with left-
factored, original productions become:
A → α A'
A' → β|γ
Example 4: Find out left-factoring grammar for following grammar:
S → aAbB | aAb
A → aA|a
B → bB | b
Solution: Applying the left-factoring rule for S-production. (Here, α = aAb, β = B & γ =
∈).
The grammar becomes:
S → aAbS'
S' → B|∈
A → aA|a
B → bB|a
Now, apply left-factoring rule for A-production and B-production. The left-factored
grammar becomes:
S → aAbS'
S' → B|∈
A → aA'
A' → A|∈
B → bB'
B' → B|∈
3.17
Compiler Construction Syntax Analysis (Parser)
3.3 RECURSIVE DESCENT PARSING [April 16, 17, 18, 19, Oct. 16,
16, 17,
17, 18]
18]
• Recursive descent is a top-down parsing technique that constructs the parse tree from
the top and the input is read from left to right.
• Recursive descent parsing technique recursively parses the input to make a parse tree,
which may or may not require back-tracking.
• Recursive descent parser is a top-down parser.
3.18
Compiler Construction Syntax Analysis (Parser)
3.3.1 Definition
• A parse that uses a set of recursive procedures to recognize its input without
backtracking is called as Recursive Descent Parsing (RDP).
• A recursive descent parsing program consists of a set of producers, one for each non-
terminal execution starts with a start symbol procedure.
• The execution ends or halts when procedure body scans the entire input string.
• General recursive-descent may require, backtracking, means it may require repeated
scans over the input.
• A typical procedure for a non-terminal in a top-down parser is as follows:
void S()
{ choose an s-production
S → X1 X2 X3 ------ Xn;
for (i = 1 to n)
{ if (xi is a nonterminal)
call procedure Xi();
else if (Xi is current input symbol)
advance the input to next symbol
else
error
}
}
VPRIME ( );
};
};
T ( ) /* procedure for T → (E)|id */
{
if input_symbol = 'id'
ADVANCE ( );
else
if input_symbol = '('
{
ADVANCE ( );
E ( );
if input_symbol = ')'
ADVANCE ( )
else ERROR ( )
}
else ERROR ( )
};
Fig. 3.14:
3.14: Recursive Descent Parsing for Grammar 3.6
Example 7: Write recursive descent parser (RDP) for the CFG given below:
S → aBAab|aBb
A → Aa|b
B → bB|b
Solution:
Solution: Here, A-production have left-recursion.
After removing left-recursion grammar becomes.
S → aBAab|aBb
A → bA'
A' → aA'|∈
B → bB|b
Now, for the grammar we need to apply left-factoring for S-productions. Thus, we get
grammar as,
S → aBS'
S' → Aab|b
A → bA'
A' → aA'|∈
B → bB|b
3.21
Compiler Construction Syntax Analysis (Parser)
3.22
Compiler Construction Syntax Analysis (Parser)
}
}
APRIME() /* function A'( ) */
{
if input_symbol = 'a'
{
ADVANCE();
APRIME();
}
else
error()
}
B() /* function B() */
{
if input_symbol = 'b'
{
ADVANCE();
B();
}
else
if input_symbol = 'b'
{
ADVANCE();
}
}
Example 8: Construct recursive descent parser for the following grammar:
S → aA|AB
A → BA|a
B → SA|b
Solution:
Solution: This grammar has no left-recursion and no left factored.
Recursive Descent Parser is as follows:
S( ) /* Procedure for S */
{ if (inputsymbol='a')
{ ADVANCE( )
A( );
3.23
Compiler Construction Syntax Analysis (Parser)
}
else
{ A( );
B( );
}
}
A( ) /* procedure for A */
{ if (inputsymbol='a')
ADVANCE( )
else
{ B( );
A( );
}
}
B( ) /* procedure for B */
{ if(inputsymbol='b')
ADVANCE( )
else
{ S( );
A( );
}
}
Example 9: Construct recursive descent parser for the following grammar:
A → 0A0|A1|AA|1
Solution:
Solution: Eliminating left-recursion the grammar is,
A → 0A0A'|1A'
where α1 is 1, α2 is A, β 1 is 0A0, β 2 is 1.
A' → 1A'|AA'|∈
The Recursive Descent Parser is as follows:
A{ } /* procedure for A */
{ if inputsymbol='0'
{ ADVANCE( );
A( );
if inputsymbol = '0'
3.24
Compiler Construction Syntax Analysis (Parser)
{ ADVANCE( );
APRIME( );
}
}
else
{ if inputsymbol='1'
{ ADVANCE( );
APRIME( );
}
else
error( );
}
APRIME( ) /* procedure for A' */
{ if inputsymbol = '1'
{ ADVANCE( );
APRIME( );
}
else
{ A( );
APRIME( ) ;
}
}
Example 10:
10: Construct recursive descent parser for the following grammar:
S → Aab|aBb
A → Aa|b
B → bB|b
Solution:
Solution: Eliminate left-recursion from A-productions
S → Aab|aBb
A → bA'
A' → aA'|∈
B → bB|b
Find left-factoring from B-productions, we get
S → Aab|aBb
A → bA'
3.25
Compiler Construction Syntax Analysis (Parser)
A' → aA'|∈
B → bB'
B' → B|∈
This grammar is now suitable for Recursive Descent Parsering. Recursive Descent
Parser is as follows:
S( ) /* procedure for S */
{ if inputsymbol='a'
{ ADVANCE( )
B( );
if inputsymbol='b'
{
ADVANCE( );
}
else
{ A( )
if inputsymbol='a'
{ ADVNACE( );
if inputsymbol = 'b'
{ ADVANCE( );
}
else
error( );
}
else
error( );
}
A ( ) /* procedure for A */
{
if inputsymbol = 'b'
{ ADVANCE ( );
APRIME ( );
}
else
error ( )
3.26
Compiler Construction Syntax Analysis (Parser)
}
APRIME( ) /* procedure for A' */
{ if inputsymbol='a'
{ ADVANCE ( );
APRIME( );
}
else
error
}
B( ) /* procedure for B*/
{
if inputsymbol='b'
{ ADVANCE( );
BPRIME( );
}
else
error( )
}
BPRIME /* procedure for B' */
{
B( );
}
Example 11:
11: Construct recursive descent parser for the following grammar:
S → iStSeSf|iStSf|0
Solution:
Solution: Eliminate left-factoring we get,
S → iStSS'|0
S' → esf|f
Now Recursive Descent Parser is as follows:
S( )
{
if inputsymbol='i'
{ ADVANCE( );
S( );
3.27
Compiler Construction Syntax Analysis (Parser)
if inputsymbol='t'
{ ADVANCE( );
S( );
SPRIME( );
}
else
error( );
}
else
if inputsymbol='0'
{ ADVANCE( );
}
else
error( );
}
SPRIME( )
{ if inputsymbol='e'
{ ADVANCE( );
S( );
if inputsymbol='f'
{ ADVNACE( );
}
else
error( );
}
else
if inputsymbol='f'
{ ADVANCE( );
}
else
error( );
}
}
3.28
Compiler Construction Syntax Analysis (Parser)
V → TV'
V' → * TV'|∈
T → (E)|id
Stack Input Output
$E id + id * id $
$ E'V (here V is TOS: Top of stack) id + id * id $ E → VE'
$ E 'V 'T id + id * id $ V → TV'
$ E'V' id id + id * id $ T → id
$ E 'V ' + id * id $ (pop, if TOS = input symbol)
$ E' + id * id $ V' → ∈
$ E 'V + + id * id $ E' → + VE'
$ E 'V id * id $ pop
$ E 'V 'T id * id $ V → TV'
$ E'V' id id * id $ T → id
$ E 'V ' * id $ pop
$ E'V'T * * id $ V' → * TV'
$ E 'V 'T id $ pop
$ E'V' id id $ T → id
$ E 'V ' $ pop
$ E' $ V' → ∈
$ $ E' → ∈
Fig. 3.14
3.14: Predictive Parsing using Stack
• Here, all derivations are leftmost derivations, here the input is scan from left to right.
Hence the TOS is always the left symbol of the rightmost sentential form of the
production rule.
3.4.1 LL Parser
• An LL parser is a top-down parser for a subset of the Context-Free Grammars (CFGs).
• An LL parser parses the input from left to right and constructs a leftmost derivation of
the sentence. The class of grammars which are parsable in this way is known as the LL
grammars.
• An LL parser is called an LL(k) parser if it uses k tokens (or input strings) of look
ahead when parsing a sentence.
• If such a parser exists for a certain grammar and it can parse sentences of this
grammar without backtracking, then it is called an LL(k) grammar.
3.30
Compiler Construction Syntax Analysis (Parser)
• Of these grammars, LL(1) grammars, although fairly restrictive are very popular
because the corresponding LL parsers only need to look at the next token to make
their parsing decisions.
• A CFG whose parsing table has no multiply defined entries is called an LL(1) grammar.
Here, the “1” signifies the fact that the LL parser uses one input symbol of look ahead
to decide its next move.
• An LL parser (Left-to-right, Leftmost derivation) is a top-down parser for a restricted
context-free language. It parses the input from Left to right, performing Leftmost
derivation of the sentence.
• An LL parser is called an LL(k) parser if it uses k tokens of lookahead when parsing a
sentence. A grammar is called an LL(k) grammar if an LL(k) parser can be constructed
from it.
• A formal language is called an LL(k) language if it has an LL(k) grammar. The set of
LL(k) languages is properly contained in that of LL(k+1) languages, for each k ≥ 0.
• An LL parser parses the input from left to right and constructs a leftmost derivation of
the sentence are called LL(1) parser.
• In LL(1) stands for Left-to-right parse, Leftmost derivation, 1-symbol lookahead.
• The LL parser is denoted as LL(k). The first L in LL(k) is parsing the input from left to
right, the second L in LL(k) stands for left-most derivation and k itself represents the
number of look aheads.
• Generally k = 1, so LL(k) may also be written as LL(1).
LL(k)
Left to right
Lest most derivation
• Both the stack and the input contains an end symbol $ to denote that the stack is empty
and the input is consumed.
• The parser refers to the parsing table to take any decision on the input and stack
element combination.
• The predictive parsers have the following components:
1. Input string which is to be parsed.
2. Stack consists of sequence of grammar symbols i.e. non-terminals and terminals of
the grammar.
3. Predictive parsing table which is 2D array [non-terminals, terminals]. It is a
tabular implementation of the recursive descent parsing, where a stack is
maintained by the parser rather than the language in which parser is written.
4. An output stream.
• The Fig. 3.16 shows the model of predictive parser.
Input a + b * c $
Stack
Stock
Predictive
X Parsing Output
Program
Y
Z
$ Parsing
Table M
Fig. 3.16
3.16: Model of Table-
able-driven Predictive Parser
• To make the parser back-tracking free, the predictive parser puts some constraints on
the grammar and accepts only a class of grammar known as LL(k) grammar.
• Predictive parsing is possible only for the class of LL(k) grammars, which are the
context-free grammars for which there exists some positive integer k that allows a
recursive descent parser to decide which production to use by examining only the next
k tokens of input.
if (A∈T or A==$)
{
if(A==r)
{
pop A from stack;
remove r from input;
}
else
ERROR();
}
else if (A∈V)
{
if(PT[A,r]= A → B1B2....Bk)
{
pop A from stack;
3.33
Compiler Construction Syntax Analysis (Parser)
3.4.5 Construction of Parse Table and LL(1) Parse Table [Oct. 17]
17]
• Parse table is a two-dimensional array where each row is leveled with non terminal
symbol and each column is marked with terminal or special symbol $. Each cell holds
a production rule.
• Now, we construct the predictive parsing table. For the construction of a predictive
parsing table, we need two functions associated with a grammar G.
• These functions are FIRST and FOLLOW, which are required to write the entries of
parsing table.
1. FIRST:
FIRST: [Oct. 17]
17]
• FIRST is computed for all grammar symbols that is for non-terminals and terminals.
Following rules are applied to find FIRST (X).
(i) If X is terminal symbol, then FIRST (X) = {X} i.e. the FIRST of terminal is terminal
itself.
(ii) If X → ∈ is a production, then FIRST (X) = {∈}.
(iii) If X is non-terminal symbol and it is having production whose leftmost symbol of
RHS is terminal then the terminal symbol is in FIRST (X). i.e. X → aα where a ∈T
then FIRST (X) = {a}.
(iv) If X is non-terminal symbol having production in the form X → AB … Z. Here, the
RHS contains sequence of non-terminals.
Now, if production A → a, then FIRST (X) = {a}.
If A → ∈ then FIRST (X) = FIRST (B).
If B → ∈ then FIRST (X) = FIRST (C) and so on.
*
If AB … Z ⇒ ∈ then FIRST (X) = {∈}.
2. FOLLOW:
FOLLOW:
• Follow is computed only for non-terminals.
• Following rules are applied to find Follow (A):
(i) If S is start symbol, then add $ to FOLLOW (S).
(ii) If there is a production A → αBβ, β ≠ ∈, then everything in FIRST (β) is in FOLLOW
(B) except ∈.
3.34
Compiler Construction Syntax Analysis (Parser)
Example 13:
13: Construct FIRST and FOLLOW for the following grammar:
S → iCtSS'|a
S' → eS|∈
C→b
Solution:
Solution:
FIRST (i) = {i}
FIRST (t) = {t}
FIRST (e) = {e}
FIRST (a) = {a}
FIRST (b) = {b}
FIRST (S) = {i, a}
FIRST (S') = {e, ∈}
FIRST (C) = {b}
FOLLOW (S) = {$, e}
by rule 1, $, is added
To compute FOLLOW (S') for starting symbol
Consider S → iCtSS'
FOLLOW (S') = FOLLOW (S)
∴ FOLLOW (S') = {$, e)
To compute FOLLOW (C)
Consider S → iCtSS'
3.36
Compiler Construction Syntax Analysis (Parser)
Example 14:
14: Find FIRST and FOLLOW of the following grammar.
S → BC|AB
A → aAa|∈
B → bAa
C→∈
Solution:
Solution: To compute FIRST:
FIRST (a) = {a}
FIRST (b) = {b}
FIRST (S) = {b, a}
FIRST (A) = {a, ∈}
FIRST (B) = {b}
FIRST (C) = {∈}
To compute FOLLOW:
FOLLOW (S) = {$}
To find FOLLOW (A)
Consider, (1) S → AB
FOLLOW (A) = FIRST (B) = {b}
Consider, (2) A → aA
FOLLOW (A) = first (a) = {a}
Consider (3) B → bAa
FOLLOW (A) = first (a) = {a}
∴ FOLLOW (A) = {a, b}
To find FOLLOW (B),
Consider, S → BC
FOLLOW (A) = FOLLOW (S) (‡ C ⇒ ∈)
Consider, S → AB
FOLLOW (B) = FOLLOW (S)
∴ FOLLOW (B) = {$}
FOLLOW (C) = FOLLOW (S) = {$}
∴ FOLLOW (S) = {$}
FOLLOW (A) = {a, b}
FOLLOW (B) = {$}
FOLLOW (C) = {$}
3.37
Compiler Construction Syntax Analysis (Parser)
Example 15:
15: Find the first and follow sets of the following grammar:
S → $#
E → E – T|T
T → F ↑ T|F
F → (E)|id
Solution:
Solution: FIRST (#) = {#} FIRST (↑) = {↑}
FIRST (( ) = {(} FIRST (–) = {–}
FIRST (id) = {id) FIRST ( )) = {)}
FIRST (S) = FIRST (E) = FIRST (T)
= FIRST (F) = {(, id}
FOLLOW (S) = {$}
FOLLOW (E) = {#, –, )}
FOLLOW (T) = {#, –, )}
FOLLOW (F) = {#, –, ), ↑}
• Rows of LL(1)-table correspond to non-terminals of the LL(1) grammar. Each cell of the
LL(1) table is either empty or including a single grammar production.
• Therefore the table includes the parser actions to be taken based on the top value of
the stack and the current input symbol.
For example, consider the Grammar,
S --> A | a
A --> a
Find their First and Follow sets:
First Follow
S –> A/a {a} {$}
A –>a {a} {$}
Parsing Table:
a $
S S –> A, S –> a
A A –> a
• Here, we can see that there are two productions into the same cell. Hence, this
grammar is not feasible for LL(1) Parser.
Solution:
Solution: 1. Start with first production:
E → VE'
FIRST (V) = {(, id}
∴ At T [E, ( ] and T [E, id] make entry E → VE'.
2. E' → + VE'
T [E', +] make entry E' → + VE'.
3. E' → ∈
Follow (E') = { ), $ }
T [E', )] and T [E', $] make entry E' → ∈.
Similarly, we can fill parsing table entries for all productions. The predictive parsing
table of Grammar 3.2 is shown in Fig. 3.17.
Non-
Non- Input symbols
Terminals + * ( ) id $
E E → VE' E → VE'
E' E' → + VE' E' → ∈ E' → ∈
V V → TV' V → TV'
V' V' → ∈ V' → * TV' V' → ∈ V' → ∈
T T → (E) T → id
Fig. 3.17:
3.17: Predictive Parsing Table
Solution:
Solution: The above grammar is left-recursive since we have recursive in B-production
(not immediate).
B → AS|a (Substitute A-production in B-productions)
B → BSS|bS|S|a, After removing left-recursion
B → bSB'|SB'|aB'
B' → SSB'|∈
Hence, grammar without left-recursion is,
S → AB
A → BS|b|∈
B → bSB'|SB'|aB'
B' → SSB'|∈
Now, calculate FIRST and FOLLOW sets,
FIRST (S) = {b, a, ∈}
FIRST (A) = {b, a, ∈}
FIRST (B) = {b, a, ∈}
FIRST (B') = {b, a, ∈}
FOLLOW (S) = {$, a, b}
FOLLOW (A) = {$, a, b}
FOLLOW (B) = {$, a, b}
FOLLOW (B') = ($, a, b}
Now, we construct predictive parsing table.
Input Symbols
Symbols
Non-
Non-terminal
a B $
S S → AB S → AB S → AB
A A → BS A → BS A → BS
A→∈ A → b, A → ∈ A→∈
B B → aB' B → bSB' B → SB'
B → SB' B → SB'
B' B' → SSB' B' → SSB B' → SSB'
B' → ∈ B' → ∈ B' → ∈
Fig. 3.18:
.18: Predictive Parsing Table
Since, parsing table has multiply defined entries, the given grammar is not LL(1).
Example 18:
18: Check whether following grammar is LL(1) or not?
S → BC | AB
A → aAa | ∈
B → bAa
C→∈
3.42
Compiler Construction Syntax Analysis (Parser)
Solution:
Solution: Find FIRST and FOLLOW of the grammar.
FIRST of the grammar:
FIRST (a) = {a}
FIRST (b) = {b}
FIRST (S) = {a, b}
FIRST (A) = {a, ∈}
FIRST (B) = {b}
FIRST (C) = {∈}
FOLLOW of the grammar:
FOLLOW (S) = {$}
FOLLOW (A) = {a, b}
FOLLOW (B) = {$}
FOLLOW (C) = {$}
Now construct the parsing table:
Terminal
Non-
Non-Terminal
a B $
S S → AB S → BC
A A → aAa A→∈
A→∈
B B → aAa
C C→∈
Above grammar is not LL(1) grammar because there is multiple entry in cell M [A, a] of
the predictive parsing table.
Example 19:
19: Check following grammar is LL(1) or not ?
S → AB
A → BS | b | ∈
Solution:
Solution: (i) No production rule from B ∴ B is useless
(ii) We can eliminate,
S → AB and A → BS
(iii) Now S is useless ∴ Grammar is useless,
(iv) It is not LL(1).
Example 20:
20: Check following grammar is LL (1) or not ?
A → AcB | cD | D
B → bB | id
D → DaB | BbB | B
3.43
Compiler Construction Syntax Analysis (Parser)
Solution:
Solution: This grammar is not LL (1) because grammar is having left-recursion in A-
production and D-production. Eliminating left-recursion we get,
A → cDA' | DA'
A' → cBA' | ∈
B → bB | id
D → BbBD' | BD'
D' → aBD' | ∈
Since, in D-production both productions are having same start symbol.
∴ Apply left-factoring
D → BD"
D-production after left-factoring
D" → bBD' | D'
Hence grammar is,
A → cDA' | DA'
A' → cBA' | ∈
B → bB | id
D → BD"
D" → bBD' | D'
D' → aBD' | ∈
Now, compute FIRST and FOLLOW:
FIRST (a) = {a} FIRST (A') = {c, ∈}
FIRST (c) = {c} FIRST (B) = {b, id}
FIRST (b) = {b} FIRST (D) = {b, id}
FIRST (id) = {id} FIRST (D') = {a, ∈}
FIRST (A) = {c, b, id} FIRST (D") = {b, a, ∈}
FOLLOW (A) = {$)
FOLLOW (B) = {c, b, a, $}
FOLLOW (A') = {$}
FOLLOW (D) = {c, $}
FOLLOW (D') = {c, $}
FOLLOW (D") = {c, $}
Now construct predictive parsing table.
3.44
Compiler Construction Syntax Analysis (Parser)
Non-
Non-
a b C id $
terminal
A A → DA' A → cDA' A → DA' A' → ∈
A' A' → cBA' A' → ∈
B B → bB B → id
D D → BD" D → BD"
D' D' → aBD' D' → ∈ D' → ∈
Example 21:
21: Check whether following grammar is LL (1).
S → abAB | Abc
A → Ba | ∈
B → bA | Aa | ∈
Solution:
Solution: Grammar contains indirect left-recursion in B-production after substituting
A-production is B-production.
S → abAB | Abc
A → Ba | ∈
B → bBa | ba | Baa | a | ∈
S → abAB | Abc
A → Ba | ∈
B → bBaB' | baB' | aB' | B'
B' → aaB' | ∈
3.45
Compiler Construction Syntax Analysis (Parser)
• Bottom-up parsers construct parse trees starting from the leaves and work up to the
root.
• The Fig. 3.19 shows the bottom-up parser can be divided into two types namely,
Operator precedence parser and LR parser.
• LR parser further divided into:
1. Simple LR i.e. SLR(1).
2. Cannonical LR
3. Lookahead LR (LALR).
Bottom-up Parser
Fig. 3.19
3.19: Types of Bottom-
ottom-up Parsing
Definition:
Definition:
• The operator grammar in which only one precedence relations <⋅, = or ⋅> holds
between two terminals, and the grammar is not having ∈-production is called an
operator-precedence grammar.
Using Operator-
perator-Precedence Relations: Consider the operator grammar of expression is:
S → S + S |S * S| a
where, a is any identifier.
• We can derive the string a + a * a. Let $ is end marker at both side of the string and the
precedence relations are shown in Fig. 3.20.
RHS
LHS a + $
*
a
Fig. 3.20
3.20:
20: Operator Precedence Relation
• We can use stack for operator-precedence parsing as follows:
1. Let $ is TOS and input string w$ has pointer which scans the input ahead.
2. If TOS < pointer symbol or TOS = pointer symbol then push input symbol to the
stack, (pointer symbol is the symbol pointed by input pointer of input string) and
advance the pointer to next symbol.
3. If TOS > pointer symbol then pop the TOS until TOS < terminal symbol which is
most recently popped. Here the string between <⋅ & ⋅> is handle which is solved
first.
4. Else error.
• Consider above grammar and used precedence relations from Fig. 3.21. Let us use
operator precedence parsing to find the handle (right-sentential form) for the string
a + a * a.
$S +a*a$ push
3.49
Compiler Construction Syntax Analysis (Parser)
$S+S* a$ push
$S+S*a $ TOS > '$' reduce
$S $ accept
Fig. 3.2
3.21: Operator Precedence Parsing
• Here, the parser finds the right end of the handle. So first S * S is solved and parse tree
is generated.
id
*
/
(
Fig. 3.22
3.22: Operator Precedence Relations
Note:
Note: Handling the unary operator is difficult in precedence parsing. Lexical analyzer
should distinguish whether operator is unary or not, before parsing.
3.51
Compiler Construction Syntax Analysis (Parser)
Example 26:
26: Find leading and Trailing symbols for the following grammar:
S → S – B|B
B → B * A|A (Grammar 3.4)
A → (S)|id
Solution:
Solution: LEADING (S) = {–, *, (, id}
LEADING (B) = {*, ( , id}
LEADING (A) = {(, id}
TRAILING (S) = {–, *, ), id}
TRAILING (B) = {*, ), id}
TRAILING (A) = {), id}
Example 27:
27: Find leading and trailing symbols of the following grammar:
E → E+T|T
T → T*F|F
F → (E) | id
Solution
Solution:
on: LEADING (E) = {+, *, (, id}
LEADING (T) = {*, (, id}
LEADING (F) = {(, id}
LEADING (E) = {+, *, id, )}
TRAILING (T) = {*, id, )}
TRAILING (F) = { ), id}
Example 28:
28: Compute LEADING and TRAILING for the following grammar.
S → (T) | a | ^
T → T, S | s (Oct. 16)
Solution:
Solution: LEADING (S) = { (, a, ^}
LEADING (T) = { (, a, ^ }
TRAILING (S) = { ), (,, a, ^}
TRAILING (T) = { ), , (,, a, ^}
Example 29:
29: Find out following grammar is operator precedence grammar or not.
S → a | ^ | (R)
T → S, T | S
R → T
Solution:
Solution: LEADING (S) = {a, ^, ( }
LEADING (T) = {,, a, ^ ( }
(R) = {,, a, ^, ( )
TRAILING (S) = {a, ^, )}
3.55
Compiler Construction Syntax Analysis (Parser)
(T) = {,, a, ^, )}
(R) = {, , a, ^, )}
Operator precedence relation table is as follow:
a ^ ( ) , $
a ⋅> ⋅> ⋅>
^ <⋅ ⋅> ⋅> ⋅>
( <⋅ <⋅ <⋅ =
) ⋅> ⋅> ⋅>
, <⋅ <⋅ <⋅
$ <⋅ <⋅ <⋅
In above table, there is no more than one precedence relation between two terminals.
So above grammar is operator precedence grammar.
Example 30:
30: Find out following grammar is operator precedence or not ?
E → E + E | E * E | (E) | id
Solution:
Solution: (1) Consider the first production and find its derivation.
E ⇒ E + E ⇒ E + E+ E
By definition 2: the non-terminal E is immediate right of '+' and it derive the
production in which the first terminal is '+' (i.e. E → E + E). ∴ + <⋅ +.
(2) Now by definition 3: The non-terminal E is immediate left of '+' and derive the
string whose last terminal system is +.
E ⇒ E+E
⇒ E+E+E
∴ + ⋅> +
Since two precedence relations hold between operators. The above grammar is not
operator precedence grammar.
Example 31:
31: Find out the following grammar is operator precedence grammar or not?
S → aAb +
A → (B|a
B → A)
Solution:
Solution: The derivation is
S ⇒ aAb +
⇒ a ( Bb +
⇒ a (A) b +
⇒ a (a) b +
3.56
Compiler Construction Syntax Analysis (Parser)
The derived string is not proper expression. So we cannot find operator precedence
relation. Therefore, grammar is not operator precedence grammar.
• a = b no edge.
3. If the graph constructed by step 2, has a cycle, then no precedence function exist. If
there is no cycle, then the numerical value of fa is the length of the longest path
beginning at the group of fa and the numeric value of ga is the length of the longest
path beginning at the group of ga (number of nodes on the path).
3.58
Compiler Construction Syntax Analysis (Parser)
Example 32:
32: Consider the following precedence relation table:
id – * $
id
fid gid
f– g–
f* g*
f$ g$
Fig. 3.23
3.23: Graph of Precedence Functions
Here, e.g. from fid the longest path we get is 4 (number of nodes present) which is
fid → g* → f– → g .
$
Similarly, we can find the longest path for all modes and we get precedence function
table as follows:
Id – * $
f 4 2 4 0
g 5 1 3 0
Example 33:
33: Consider the following grammar and find precedence functions.
E = E+T|T
T = T*F|F
F → (E) | id
3.59
Compiler Construction Syntax Analysis (Parser)
Solution:
Solution: Find first precedence relation table.
RHS
+ * ( ) id $
LHS
+ > < < > < >
* > > < > < >
( < < < = <
) > > > >
id > > > >
$ < < < <
Compute the graph of precedence function.
f+ g+
f* g
*
fc gc
f7 g7
f id g id
f$ g$
Fig. 3.24:
3.24: The Graph of Precedence Function
We find the longest path for all nodes. We get precedence function table as follows:
+ * ( ) id $
f 2 4 0 4 4 0
g 1 3 5 0 5 0
• Shift-reduce parsing uses two unique steps for bottom-up parsing. These steps are
known as shift-step and reduce-step.
o Shift Step: The shift step refers to the advancement of the input pointer to the next
input symbol, which is called the shifted symbol. This symbol is pushed onto the
stack. The shifted symbol is treated as a single node of the parse tree.
o Reduce Step: When the parser finds a complete grammar rule (RHS) and replaces
it to (LHS), it is known as reduce-step. This occurs when the top of the stack
contains a handle. To reduce, a POP function is performed on the stack which pops
off the handle and replaces it with LHS non-terminal symbol.
• A shift-reduce parser uses a stack to hold the grammar symbols while awaiting
reduction. During the operation of the parser, symbols from the input are shifted onto
the stack.
• If a prefix of the symbols on top of the stack matches the RHS of a grammar rule which
is the correct rule to use within the current context, then the parser reduces the RHS of
the rule to its LHS, replacing the RHS symbols on top of the stack with the non-
terminal occurring on the LHS of the rule.
• This shift reduce process continues until the parser terminates, reporting either
success or failure.
• The parser has input buffer, which consists of the string which is to be parsed. Initially
stack contains $ at the bottom of the stack. Let input string ends with $.
• Initially stack is $ and input is ω$.
• The parser shift zero or more input symbols onto the stack until a handle is on the top
of the stack. If the handle is found at TOS then it reduces to LHS of the appropriate
production rule. This shift or reduce process is continue until the stack has the start
symbol and the input is empty.
• Here, the string is successfully parse and parser halts. While parsing the parser can
detect the error and parsing is not successful. The shift-reduce parsing using stack is
shown in Fig. 3.25.
• Now, consider the above Grammar 3.3.
Production Number:
umber:
1. S→S–B
2. S→B
3. B→B*A
4. B→A
5. A → id
3.61
Compiler Construction Syntax Analysis (Parser)
• Shift-reduce parser is a type of bottom-up parser. It generates the parse tree from
leaves to the root.
Reduction:
• In Shift Reduce Parser, the input string will be reduced to the starting symbol. This
reduction can be produced by handling the rightmost derivation in reverse, i.e., from
starting symbol to the input string.
• To perform reduction, the parser must know the right end of the handle which is at
the top of the stack.
3.62
Compiler Construction Syntax Analysis (Parser)
• Then the left end of the handle within the stack is located and the non-terminal to
replace the handle is decided.
Example 34:
34 Perform the bottom-up parsing for the given string on the Grammar, i.e.,
shows the reduction for string abbcde on the following Grammar
S→aABe
A→Abc|b
B→d
Solution :
rm
S⇒ aABe
rm
⇒ aAde
rm
Bottom-up Parsing
⇒ aAbcde (Reduction)
rm
⇒ abbcde
• It can reduce the string abbcde to the starting symbol S by applying the rightmost
derivation in reverse at each step.
Handle:
Handle: [April 19]
19]
• Each replacement of the Right side of production by the left side in the process above
is known as "Reduction" and each replacement is called "Handle."
• A handle of a string is a substring that matches the right side of a production and
whose reduction to the non-terminal on the left side of the production represents one
step along the reverse of a rightmost derivation.
• While reduction, we find the match. The sentential form (string) which matches the
RHS of production rule while reduction, then that string is called "handle".
• Here, we are using right most derivation in reverse direction. i.e. we start with a string
of terminals which we want to parse and derive the start symbol in reverse.
S ⇒* ω (derivation)
Example 35:
35: Consider the Grammar,
E→E+E
E→E*E
E → (E)
E → id
Perform Rightmost Derivation string id1 + id2 * id3. Find Handles at each step.
Solution :
rm
S⇒ E+E
rm
⇒ E+E*E
rm
⇒ E + E * id3 Bottom-Up Parsing
rm
⇒ E + id2 * id3
rm
⇒ id1 + id2 * id3
3.63
Compiler Construction Syntax Analysis (Parser)
E + E * id3 id3 E → id 3
3.64
Compiler Construction Syntax Analysis (Parser)
Solution:
rm
S⇒ CC
rm
⇒ Cd
rm
⇒ cCd Bottom-up Parsing
rm
⇒ cc C d
rm
⇒ cc d d
Stack Input String Action
$ ccdd$ Shift
$c cdd$ Shift
$ cc dd$ Shift
$ ccd d$ Reduce by C →id
$ ccC d$ Reduce by C →cC
$cC d$ Reduce by C →cC
$C d$ Shift
$Cd $ Reduce by C →d
$CC $ Reduce by S →CC
$S $ Accept
$ w $ 3.27
Fig. 3.27
3. Shift: Parser shifts zero or more input symbols onto the stack until the handle is on
top of the stack.
4. Reduce: Parser reduce or replace the handle on top of the stack to the left side of
production, i.e., R.H.S. of production is popped, and L.H.S is pushed.
5. Accept: Step 3 and Step 4 will be repeated until it has identified an error or until
the stack includes start symbol (S) and input Buffer is empty, i.e., it contains $.
Stack Input String
$ S $
Fig. 3.28
3.28
6. Error: Signal discovery of a syntax error that has appeared and calls an error
recovery routine.
• For example, consider the grammar
S → aAcBe
A → Ab|b
B → d
and the string is abbcde.
• It can reduce this string to S. It can scan string abbcde looking for the substring that
matches the right side of some production. The substrings b and d qualify.
• Let us select the left-most b and replace it with A, the left side of the production A →b,
and obtain the string aAbcde.
• It can identify that Ab, b, and d each connect the right side of some production.
Suppose this time it can select to restore the substring Ab by A, the left side of the
→
production A Ab and it can achieve aAcde.
• Thus replacing d by B, the left side of the production B →d, and can achieve aAcBe. It
can replace this string by S.
• Each replacement of the right-side of a production by the left side in the process above
is known as reduction.
Drawbacks of Shift Reduce Parsing:
Parsing:
1. Shift|Reduce Conflict:
Conflict: Sometimes, the SR parser cannot determine whether to shift
or to reduce.
2. Reduce|Reduce Conflict:
onflict: Sometimes, the Parser cannot determine which of
Production should be used for reduction.
3.66
Compiler Construction Syntax Analysis (Parser)
Example 37:
37: To stack implementation of shift-reduce parsing is done, consider the
grammar:
E→E+E
E→E*E
E →(E)
E →id and input string as id 1 + id2 → id 3 id1 + id2 * id3.
Stack Input String Action
$ id1 + id2 * id3$ Shift
$ id1 +id2 * id3$ Reduce by E →id
$E +id2 * id3$ Shift
$E+ id2 * id3$ Shift
$ E + id2 * id3$ Reduce by E →id
$E + E * id3$ Shift
$E + E * id3$ Shift
$E + E * id3 $ Reduce by E→id
$E + E * E $ Reduce by E →E * E
$E + E $ Reduce by E →E + E
$E $ Accept
Let us illustrate the above stack implementation.
→ Let the grammar be,
S → AA
A → αA
A→b
Let the input string 'ω' be abab$
ω = abab$
Stack Input String Action
$ abab$ Shift
$a bab$ Shift
$ab ab$ Reduce (A → b)
$aA ab$ Reduce (A → aA)
$A ab$ Shift
$Aa b$ Shift
$Aab $ Reduce (A → b)
$AaA $ Reduce (A → aA)
$AA $ Reduce (S → AA)
$S $ Accept
3.67
Compiler Construction Syntax Analysis (Parser)
Conflicts
Conflicts during Shift-
Shift-reduce Parsing:
• For some CFG (context free grammars) shift-reduce parsing is not used. The parser
cannot decide whether action is shift or reduce. This is called shift-reduce conflicts.
• Sometimes parser cannot decide which reduction have to be made among many
reductions. This is called reduce-reduce conflicts.
• If the grammar is having conflict while parsing, then such a grammar is not LR
grammar. This conflict example we will discuss in later sections.
• An ambiguous grammar can never be LR. Grammars used in compiling process
usually fall in the LR (1) class, where one is lookahead symbol.
Example 38:
38: Consider grammar of if-then-else statement.
stmt → if expr then stmt
| if expr then stmt else stmt
| other
Suppose we have the current stack configuration in shift-reduce parser is stack.
stack input
… if expr then stmt else … $
Here, we cannot tell if expr then stmt is a handle. The parser either reduce "if expr
then stmt" into stmt or shift "else" to the stack depending upon what follows the else
on the input.
Hence, in the above grammar shift-reduce conflict occurs during parsing. The above
grammar is ambiguous. It is possible to resolve the conflict and grammar can be made
unambiguous.
Example 39:
39: The another conflict is reduce-reduce conflict. Consider an example in
which lexical analyzer returns token name id for all names, regardless of their type. The
statement a (i, j) appear as a token stream id (id, id) to the parser.
Some productions are as follows:
1. stmt → id (param_list)
2. param_list → param_list, param
3. param → id
4. param_list → param
5. expr → id, expr
6. expr → id
If the current stack configuration is,
stack input
… id ( id , id ) …
3.68
Compiler Construction Syntax Analysis (Parser)
Here, id is on top of the stack and id must be reduced, but by which production? Either
by production (3) or production (6). Here, reduce-reduce conflict occurs. The solution
is, we can change the grammar production and parse unambiguous grammar.
Stack Sm
LR
Sm–1 parsing Output
... program
$
Action Go to
Fig. 3.29:
3.29: Model of LR Parser
Fig. 3.30:
3.30: Types of LR Parser
• Let us see different parsers in detail.
3.70
Compiler Construction Syntax Analysis (Parser)
SLR Parser:
Parser: [April 16,
16, 17, 18, 19, Oct. 16,
16, 17,
17, 18]
18]
• SLR represents "Simple LR Parser". It is very easy and cost-effective to execute.
• The SLR parsing action and goto function from the deterministic finite automaton that
recognizes viable prefixes.
• It will not make specifically defined parsing action tables for all grammars but does
succeed on several grammars for programming languages.
• Given a grammar G. It augment G to make G’, and from G’ it can construct C, the
canonical collection of a set of items for G’.
• It can construct ACTION the parsing action function, and GOTO, the goto function,
from C using the following simple LR Parsing table construction technique. It needed
us to understand FOLLOW (A) for each non-terminal A of a grammar.
• The SLR(1) is a grammar having an SLR parsing table is said to be SLR(1).
Working of SLR Parser:
Parser:
• SLR Parsing can be done if context-free Grammar will be given. In LR (0), 0 means
there is no Look Ahead symbol.
• Fig. 3.31 shows working of SLR parser.
Context Free Grammar (CFG)
Fig. 3.31
• The LR(0) item for Grammar G consists of a production in which symbol dot (.) is
inserted at some position in R.H.S of production.
3.71
Compiler Construction Syntax Analysis (Parser)
• For example, for the production S →ABC, the generated LR (0) items will be,
S → · ABC
S → A · BC
S → AB · C
S → ABC ·
Production S → ε generates only one item, i.e., S →·. Canonical LR(0) collection helps
∙
LALR Parser:
Parser: [April 16,
16, 17, Oct. 16,
16, 17]
17]
• LALR Parser is Look Ahead LR Parser. It is intermediate in power between SLR and
CLR parser.
• It is the compaction of CLR parser, and hence tables obtained in this will be smaller
than CLR parsing table.
• Fig. 3.33 shows working of LALR parser.
Context Free Grammar (CFG)
Fig. 3.33
• For constructing the LALR(1) parsing table, the canonical collection of LR(1) items is
used.
• In the LALR(1) parsing, the LR(1) items with the equal productions but have several
look ahead are grouped to form an individual set of items.
• It is frequently the similar as CLR(1) parsing except for the one difference that is the
parsing table.
• The overall structure of all these LR Parsers is the same. There are some common
factors such as size, class of context-free grammar, which they support, and cost in
terms of time and space in which they differ.
• Let us see the comparison between SLR, CLR, and LALR Parser.
3.73
Compiler Construction Syntax Analysis (Parser)
• For example: If the set of items I is {[S' → S·] [S → S · – B] then GOTO (I, –) is the
following rule.
S → S–·B (dot is shifted by 1 position)
• Now, after dot immediate non-terminal B is present, so we again use closure
procedure.
S → S–·B
B → ·B*A
B → ·A
A → · (S)
A → · id
Construction of LR (0) Items:
tems:
Input: Augmented grammar G'.
Input:
Output:
Output: LR (0) items [canonical collection of LR (0)].
Procedure:
Procedure:
1. A = {closure ([S' → · S])} where, S' is start symbol.
2. For such item I in A and each grammar symbol X such that goto (I, X) is not in C,
add goto (I, X) to C until no more sets of items can be added to C.
Example 40: Consider the grammar:
S → S – B|B
B → B * A|A (Gramm
(Grammar
rammar 3.6)
A → (S)|id
Find the canonical sets of LR (0) items.
Solution:
Solution: Make the grammar augmented first.
S' → S
S → S–B
S → B
B → B*A
B → A
A → (S)
A → id
Now, find LR (0) sets of items:
I0: S' → · S
S → ·S–B
S → ·B
B → ·B*A
3.76
Compiler Construction Syntax Analysis (Parser)
B → ·A
A → · (S)
A → · id
Goto (I0, S)
I1: S' → S ·
S → S·–B
Goto (I0, B)
I2: S → B ·
B → B·*A
Goto (I0, A)
I3: B → A ·
Goto (I0, C)
I4 : A → (· S)
S → ·S–B
S → ·B
B → ·B*A
B → ·A
A → · (S)
A → · id
Goto (I0, id)
I5: A → id ·
Goto (I1, –)
I6: S → S – · B
B → ·B*A
B → ·A
A → · (S)
A → · id
Goto (I2, *)
I7: B → B * · A
A → · (S)
A → · id
Goto (I4, S)
I8: A → (S ·)
S → S·–B
3.77
Compiler Construction Syntax Analysis (Parser)
Goto (I4, B) = I2
Goto (I4, A) = I3
Goto (I4, C) = I4 repeated
Goto (I4, id) = I5
Goto (I6, B)
I9: S → S – B ·
B → B·*A
Goto (I6, A) = I3
Goto (I6, C) = I4 repeated
Goto (I6, id) = I5
Goto (I7, A)
I10: B → B * A ·
Goto (I8, ))
I11: A → (S) ·
No more items can be added.
The I0 to I11 are the canonical LR (0) collections we can draw DFA for above LR (0)
items, is shown in Fig. 3.34.
S – B I9
I0 I1 I6
A
I3 l id
I4 *
I5
B * A
I2 I7 I10
B
id
I5
A I3 (
A
(
( S )
I4 I8 I11
id
–
id
I5
Fig. 3.34
3.34:
34: DFA
3.78
Compiler Construction Syntax Analysis (Parser)
2. Consider I1 item:
S' → S ·
S → S·–B
rule (c) is applicable for S' → S ·.
We get,
action [1, $] = accept
and for S → S · – B rule (a) is applicable and we get,
action [1, –] = shift 6 or S6
3. Consider I2 item:
S → B·
find FOLLOW (S), we get,
action [2, $] = action [2, –] = action [2, )] = reduce by S → B i.e. r2
B → B·*A
action [2, *] = shift 7 i.e. S7.
• Similarly, we find actions for all set of items of LR (0) and then parsing table entries
are made, which are shown in Fig. 3.35.
Action Goto
State
id – * ( ) $ S B A
0 S5 S4 1 2 3
1 S6 acc
2 r2 S7 r2 r2
3 r4 r4 r4 r4
4 S5 S4 8 2 3
5 r6 r6 r6 r6
6 S5 S4 9 3
7 S5 S4 10
8 S6 S11
9 r1 S7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
Fig. 3.35
3.35:
35: SLR Parsing Table for Expression Grammar
So the grammar is SLR(1).
3.80
Compiler Construction Syntax Analysis (Parser)
Remember:
Remember:
"Every SLR(1) grammar is unambiguous", always true but "every unambiguous
grammar is SLR(1)", not always true. Some unambiguous grammars are not SLR(1).
Example 41:
41: Check whether following grammar is SLR (1) or not.
S → L =R
S → R
L → *R (Grammar 3.7)
L → id
R → L
Solution:
Solution: First we find the set of LR (0) items as follows:
I0: S' → ·S
S → ·L=R
S → ·R
L → ·*R
L → · id
R → ·L
I1: S' → S · I5: L → id ·
I2: S → L · = R I6: S → L = · R
R → L· R → ·L
I3 : S → R · L → ·*R
L → · id
I4 : L → * · R
R → ·L I7 : L → * R ·
L → ·*R I8 : R → L ·
L → · id I9: S → L = R
Fig. 3.36
3.36:
36: Canonical LR(0) for Grammar 3.7
FOLLOW (S) = {$} FIRST (S) = {*, id}
FOLLOW (L) = {=, $} FIRST (L) = {*, id}
FOLLOW (R) = {$, =} FIRST (R) = {*, id}
Action Goto
States
= * id $ S L R
0 S4 S5 1 2 3
1 accept
2 S6/r5 r5
3 r2
4 S4 S5 8 7
5 r4 r4
6 9
7 r3 r3
8 r5 r4
9 r1
Fig. 3.37
3.37:
37: The SLR Parsing Table
Since there is shift-reduce conflict at entry action [2, =] in the table. The above
grammar is not SLR(1).
Example 42:
42: Construct SLR parsing table for following grammar.
E → E+T|T
T → T*F|F
F → (E) | id
Solution: E'
Solution: → E (Augumented grammar)
1. E → E+T
2. E → T
3. T → T*F
4. T → F
5. F → (E)
6. F → id
The LR (0) items are as follows:
I0: E' → ⋅ E
E → ⋅E+T
E-productions are added
E → ⋅T
T → ⋅T*F
T-productions are added
T → ⋅F
F → ⋅ (E)
F-productions are added
F → ⋅ id
3.82
Compiler Construction Syntax Analysis (Parser)
Goto (I0, E)
I1: E' → E ⋅
E → E⋅+T
Goto (I0, T)
I2: E → T⋅
T → T⋅*F
Goto (I0, F)
I3: T → F⋅
Goto (I0, ( )
I4: F → (⋅ E)
E → ⋅E+T
E → ⋅T
T → ⋅T*F
T → ⋅F
F → ⋅ (E)
F → ⋅ id
Goto (I0, id)
I5: F → id ⋅
Goto (I1, +)
I6: E → E+⋅T
T → ⋅T*F
T → ⋅F
F → ⋅ (E)
F → ⋅ id
Goto (I2, *)
I7: T → T*⋅F
F → ⋅ (E)
F → ⋅ id
Goto (I4, E)
I8: F → (E ⋅)
E → E⋅+T
Goto (I4, T) = I2
Goto (I4, F) = I3
Goto (I4, ( ) = I4 repeated
Goto (I4, id) = I5
3.83
Compiler Construction Syntax Analysis (Parser)
Now, Goto(I6, T) = I9
I9: E → E+T⋅
T → T⋅*F
Goto (I6, F) = I3
Goto (I6, ( ) = I4 repeated
Goto (I6, id) = I5
Now, Goto (I7, F) = I10
I10: T → T*F⋅
Goto (I7, ( ) = I4
Goto (I7, id) = I5
Now, Goto (I8, ) ) = I11
I11: F → (E) ⋅
Goto (I8, +) = I6
Goto (I9, *) = I7
Now, compute FOLLOW of non-terminals.
FOLLOW (S) = {$, + ,)}
FOLLOW (T) = {$, +, *, )}
FOLLOW (F) = {$, +, *, )}
Now construct the parsing table,
Action Goto
State
id – * ( ) $ E T F
0 S5 S4 1 2 3
1 S6 acc
2 r2 S7 r2 r2
3 r4 r4 r4 r4
4 S5 S4 8 2 3
5 r6 r6 r6 r6
6 S5 S4 9 3
7 S5 S4 10
8 S6 S11
9 r1 S7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
Fig. 3.38
3.38:
38: SLR Parsing Table for Expression Grammar
3.84
Compiler Construction Syntax Analysis (Parser)
Consider the string id * id + id and show the moves of LR parsers to parse the string
using stack.
Stack Symbols
ymbols Input Action
$ id * id + id $ shift
id * id + id $ reduce by F → id
F * id + id $ reduce by T → F
T * id + id $ shift
T* id + id $ shift
T * id + id $ reduce by F → id
T*F + id $ reduce by T → T * F
T + id $ reduce by E → T
E + id $ shift
E+ id $ shift
E + id $ reduce by F → id
E+F $ reduce by T → F
E+T $ reduce by E → E + T
E $ accept
Fig. 3.39
3.39:
39: Moves of LR Parser for id * id + id
Example 43:
43: Check following grammar is SLR(1) grammar or not?
S → A|B
A → aA | b
B → dB | b
Solution:
Solution: Augmented grammar is,
S' → S
1. S → A
2. S → B
3. A → aA
4. A → b
5. B → dB
6. B → b
Now, find LR (0) set of items:
I0: S' → ⋅ S
S → ⋅A
S → ⋅B
3.85
Compiler Construction Syntax Analysis (Parser)
A → ⋅ aA
A → ⋅b
B → ⋅ dB
B → ⋅b
Goto (I0, S)
I1: S' → S ⋅
Goto (I0, A)
I2: S → A
Goto (I0, B)
I3: S → B
Goto (I0, a)
I4: A → a ⋅ A
A → ⋅ aA
A → ⋅b
Goto (I0, b)
I5: A → b
B → b
Goto (I0, d)
I6: B → d ⋅ B
B → ⋅ dB
B → ⋅b
Goto (I4, A)
I7: A → aA ⋅
Goto (I4, a) = I4
Goto (I4, b)
I8: A → b ⋅
Goto (I6, B)
I9: B → dB ⋅
Goto (I6, d) = I6
Goto (I6, b)
I10: B → b ⋅
No more items can be added.
FOLLOW (S) = FOLLOW (A) = FOLLOW (B) = {$}
3.86
Compiler Construction Syntax Analysis (Parser)
3.87
Compiler Construction Syntax Analysis (Parser)
A → ⋅ Ab
A → ⋅b
Goto (I0, a)
I3: S → a ⋅ A
A → ⋅ Ab
A → ⋅b
I4: Goto (I2, A)
I4: S → bA ⋅ B
A → A⋅b
B → ⋅ aB
B → ⋅a
Goto (I2, b)
I5: A → b ⋅
Goto (I3, A)
I6: S → aA ⋅
A → A⋅b
Goto (I3, b) = I5 (repeated)
Goto (I4, B)
I7: S → bAB ⋅
Goto (I4, b)
I9: A → Ab ⋅
Goto (I4, a)
I8: B → a ⋅ B
B → a⋅
B → ⋅ aB
B → ⋅a
Goto (I6, b) ≅ I8
Goto (I9, B)
I10: B → aB ⋅
Goto (I9, a) ≅ ± a
No more items can be added.
FIRST FOLLOW
S' {b, a} {$}
S {b, a} {$}
A {b} {a, b, $}
B {a} {$}
3.88
Compiler Construction Syntax Analysis (Parser)
Goto (I0, A)
I2: S → A ⋅
Goto (I0, B)
I3: A → B ⋅ ; C
B → B⋅;d
Goto (I0, b)
I4: B → b ⋅ d
Goto (I3, ;)
I5: A → B ; ⋅ C
C → ⋅ ae
C → ⋅a;C
B → B;⋅d
Goto (I4, d)
I6: B → bd ⋅
Goto (I5, C)
I7: A → B ; C ⋅
Goto (I5, a) = I8
I8: C → a ⋅ e
C → a⋅;C
Goto (I8, d)
I9: B → B ; d ⋅
Goto (I8, e)
I10: C → ae ⋅
Goto (I8, ;)
I11: C → a ; ⋅ C
C → ⋅ ae
Goto (I11, C)
I12: C → a ; C ⋅
Goto (I11, a)
I13: C → a ⋅ e
Goto (I13, e) = I10
No more items can be added.
Now, compute FOLLOW.
FOLLOW (S) = FOLLOW (A) = FOLLOW (C) = {$}
FOLLOW (B) = {;}
3.90
Compiler Construction Syntax Analysis (Parser)
3.91
Compiler Construction Syntax Analysis (Parser)
Example 46: Check whether the following grammar is SLR(1) or not? [April 16]
16]
S → A and A|A or A
A → id|(S)
Solution:
Solution: The Augmented grammar is,
S' → S
S → A and A … (1)
S → A or A … (2)
A → id … (3)
A → (S) … (4)
I0: S' → · S'
S → · A and A'
S → · A or A
A → · id
A → · (S)
I1: Goto (I0, S)
S' → S ·
I2: Goto (I0, A)
S → A · and A'
S → A · or A'
I3: Goto (I0, id)
A → id ·
I4: Goto (I0, ( )
A → (· S)
S → · A and A'
S → · A or A'
A → id
A → · (S)
I5: Goto (I2, and)
S → A and · A
A → · id
A → · (S)
I6: Goto (I2, or)
S → A or · A
A → · id
A → · (S)
3.92
Compiler Construction Syntax Analysis (Parser)
Solution:
Solution: 1. To check grammar is SLR (1).
Augmented grammar is:
S' → S
1. S → SA
2. S → A
3. A → a
Now, compute LR (0) set of items.
I0: S' → · S
S → · SA
S → ·A
A → ·a
Goto (I0, S)
I1: S' → S ·
S → S·A
A → ·a
Goto (I0, A)
I2: S → A ·
Goto (I0, a)
I3: A → a ·
Goto (I1, A)
I4: S → SA ·
Goto (I1, a) = I3
SLR Table
Action Goto
State
a $ S A
0 S3 1 2
1 S3 accept 4
2 r2 r2
3 r3 r3
4 r1 r1
There is no multiply entires so the grammar is SLR (1).
FOLLOW (S) = {$, a} = FOLLOW (A)
2. To check grammar is not LL (1):
The grammar is left-recursive. After eliminating left-recursion we get,
S → AS'
3.94
Compiler Construction Syntax Analysis (Parser)
S' → AS' | ∈
A → a
The FIRST and FOLLOW are as follows:
FIRST (S) = {a}
FIRST (A) = {a}
FIRST (S') = {a, ∈}
FOLLOW (S) = {$}
FOLLOW (S') = {$}
FOLLOW(A) = {$}
Construction of LL (1) Parsing table:
a $
S S → AS'
A A→a
S' S' → AS' S' → ∈
S' → ∈
The table has multiply define entries.
Therefore, grammar is not LL (1).
then add [B → · γ, b] in I
until no more items can be added to I;'
}
2. goto (I, X):
X):
{
If [A → α · Xβ, a] is in set of items,
add [A → αX·β, a] in I.
}
3. To find LR(1) items:
items:
Let A = {closure ([S' → · S, $])}
for each set of items in I, and for each grammar symbol X, add goto (I, X) to A if it
was not present until no more items can be added to A.
Example 48: Consider the following grammar:
S → AA
A → aA|b
Solution:
Solution: First make it augmented grammar as,
S' → S
S → AA
A → aA|b
Now, find LR(1) set as follows:
Start with [S' → · S, $]
I0: S' → · S , $
Since, after dot S is non-terminal so use procedure for closure to add items.
Here, A = S', B = S, α = ∈ and β = ∈, a = $.
for [A → α · Bβ, a] and first (βa) = first (∈$) = {$}.
So we add the item
S → · AA, $
Now, again after dot non-terminal A so again apply precedence closure.
For production A → ⋅ AA, $
Here, A = S, α = ∈, B = A, β = A and a = $. (apply A → α⋅Bβ)
first (βa) = first (A) = {a, b}
So we add the items for A as to follows with lookaheads a and b.
A → · aA, a|b
A → · b, a|b
3.96
Compiler Construction Syntax Analysis (Parser)
Goto (I0, A)
I2: S → A · A, $
(while using goto function lookahead will not change)
Now, we have to add A-productions using closure rule
A → α · Bβ, $
α = A, B = A, β = ∈ and a = $
∴ first (βa) = $
we add,
A → · aA, $
A → · b, $
The I2 is
I2: S → A · A, $
A → · aA, $
A → · b, $
Similarly, we proceed further,
Goto (I0, a)
I3: A → a · A, a|b
A → · aA, a|b
A → · b, a|b
Goto (I0, b)
I4: A → b ·, a|b
Goto (I2, A)
I5: S → AA ·, $
Goto (I2, a)
I6: A → a · A, $
A → · aA, $
A → · b, $
3.97
Compiler Construction Syntax Analysis (Parser)
Goto (I2, b)
I7: A → b ·, $
Goto (I3, A)
I8: A → aA ·, a|b
Goto (I6, A)
I9: A → aA ·, $
Goto (I6, a) = I6
Goto (I6, b) = I7
No more items can be added.
We get I0 to I9 are the set of LR(1) items. The DFA using goto is shown in Fig. 3.40.
I0 S I1
A A
I2 I5
a
b A
I6 I9
a I3 b
b A I7
b
I4 I 8
Fig. 3.40
3.40:
40: DFA for goto Function
• Every SLR(1) grammar is an LR(1) and SLR parser may have less states than LR(1)
parser for the same grammar because LR parsing uses lookaheads.
• The LR(1) parsing table is shown in Fig. 3.41 for grammar of Example 47.
Action Goto
State
a b $ S A
0 s3 s4 1 2
1 acc
2 s6 s7 5
3 s3 s4 8
4 r3 r3
5 r1
6 s6 s7 9
7 r3
8 r2 r2
9 r2
Fig. 3.41
3.41:
41: Canonical Parsing Table
Goto (I0, S)
I1: S' → S ·, $
Goto (I0, a)
I2: S → a · Ad, $
S → a · Be, $
A → · c, d
B → · c, e
Goto (I0, b)
I3: S → b · Bd, $
S → b · Ae, $
B → · c, d
A → · c, e
Goto (I2, A)
I4: S → aA · d, $
Goto (I2, B)
I5: S → aB · e, $
Goto (I2, C)
I6: A → c ·, d
Goto (I3, B)
I7: S → bB · d, $
Goto (I3, A)
I8: S → bA · e, $
Goto (I3, c)
I9: B → c ·, d
A → c ·, e
Goto (I4, d)
I10: S → aAd ·, $
Goto (I5, e)
I11: S → aBe ·, $
Goto(I7, d)
I12 : S → bBd ·, $
Goto (I8, e)
I13: S → bAe ·, $
The LR(1) parsing table is shown in Fig. 3.43.
3.101
Compiler Construction Syntax Analysis (Parser)
Action Goto
State
a b c d e $ S A B
0 S2 S3 1
1 acc 4 5
2 S6 8 7
3 S9
4 S10
5 S11
6 r5 r6
7 S12
8 S13
9 r6 r5
10 r1
11 r3
12 r2
13 r4
Fig. 3.43
3.43:
43: The LR(1) Parsing Table
3.102
Compiler Construction Syntax Analysis (Parser)
7 S12
8 S13
10 r1
11 r3
12 r2
13 r4
Fig. 3.44
3.44:
44: LALR(1) Table for Grammar 3.8
There is reduce-reduce conflict.
So grammar is not LALR.
Example 50:
50: Check following grammar is LR (1) grammar or not?
S → bA | a
A → CB | CBe
B → dB
Solution:
Solution: In the above grammar C is useless symbol. After eliminating productions of C
symbol we get,
S → bA | a
B → dB
Again A and B are useless. Therefore grammar is S → a,
Augumented grammar is,
S' → S
S → a
Now find LR (1) items:
I0: S' → ⋅ S, $
S → ⋅ a, $
Goto (I0, 5)
I1: S' → S ⋅, $
Goto (I0, a)
I2: S → a ⋅, $
Canonical LR parsing table is as follows:
Action Goto
State
a $ S
0 S2 1
1 accept
2 r1
3.103
Compiler Construction Syntax Analysis (Parser)
Goto (I2, +)
I6: E → E + ⋅ T, + 1 =
T → ⋅ T * a, + | = | *
T → ⋅ a, + | = | *
Goto (I4, *) = I7
I7: T → T * ⋅ a, + | = | *
Goto (I5, E)
I8: S → E = E ⋅, $
E → E ⋅ + T, + | $
Goto (I5, T)
I9: E → T ⋅, + | $
T → T ⋅ * a, + | $ | *
Goto (I5, a)
I10: T → a ⋅,$ | * | +
Goto (I6, T)
I11: E → E + T ⋅, + | =
T → T ⋅ * a, + | = | *
Goto (I6, a)
I12: T → a ⋅, + | = | *
Goto (I7, a)
I13: T → T * a ⋅, + | = | *
Goto (I8, +)
I14: E → E + ⋅ T, + | $
T → ⋅ T * a, + | $ | *
T → ⋅ a, + | $ | *
Goto (I9, *)
I15: T → T * ⋅ a, + | $ | *
Goto (I14, T)
I16: E → E + T *, + | $
T → T ⋅ * a, $ | + | *
Goto (I14, a)
I17: T → a ⋅, + | $ | *
Goto (I15, a)
I18: T → T * a ⋅, + |$| *
3.105
Compiler Construction Syntax Analysis (Parser)
3. A → BB
4. A → b
5. B → aAb
6. B → a
FOLLOW (S) = {$) FIRST (S) = FIRST (A) {b, a}
FOLLOW (A) = {a, b} FIRST (B) = {a}
(B) = {$, a, b}
Now find out LR (1) items:
I0: S' → · S, $
S → · AB, $
A → · BSB, a
A → · BB, a
A → · b, a
B → · aAb, a | b
B → · a, a | b
Goto (I0, S) = I
I1: S' → S ·, $
Goto (I0, A)
I2: S → A · B, $
B → · aAb, $
B → · a, $
Goto (I0, B)
I3: A → B · SB, a
A → B · B, a
S → · AB, a
A → · BSB, a
A → · BB, a
A → · b, a
B → · aAb, a | b
B → · a, a | b
Goto (I0, b)
I4: A → b ·, a
Goto (I0, a)
I5: B → a · Ab, a | b
B → a ·, a | b
3.107
Compiler Construction Syntax Analysis (Parser)
A → · BSB, b
A → ⋅ BB, b
A → · b, b
B → · aAb, a | b
B → · a, a | b
Goto (I2, B)
I6: S → AB ·, $
Goto (I2, a)
I7: B → a · Ab, $
B → a ·, $
A → · BSB, b
A → · BB, b
A → · b, b
B → · aAb, a | b
B → · a, a | b
Goto (I3, S)
I8: A → BS · B, a
B → · aAb, a
B → · a, a
Goto (I3, B)
I9: A → BB ·, a
A → B · SB, a
A → B · B, a
S → · AB, a
A → · BSB, a
A → · BB, a
A → · b, a
B → · aAb, a | b
B → · a, a | b
Goto (I3, A)
I10: S → A · B, a
B → · aAb, a
B → · a, a
Goto (I3, b) = I4
Goto (I3, a) = I5
3.108
Compiler Construction Syntax Analysis (Parser)
Goto (I5, A)
I11: B → aA · b, a | b
Goto (I5, B)
I12: A → B · SB, b
A → B · B, b
S → · AB, a
A → · BSB, a
A → · BB, a
A → · b, a
B → · aAb, a | b
B → · a, a | b
Goto (I5, a)
I13: B → a ·, a | b
B → a · Ab, a | b
A → · BSB, a | b
A → · BB, a | b
A → · b, a | b
Goto (I5, b)
I14: A → b ·, b
Goto (I7, A)
I15: B → aA · b, $
Goto (I7, B) = I12
Goto (I7, b) = I13
Goto (I7, a) = I5
Goto (I8, B)
I16: A → BSB ·, a
Goto (I8, a)
I17: B → a · Ab, a
B → a ·, a
A → · BSB, b
A → · BB, b
A → · b, b
B → · aAb, a | b
B → · a, a | b
Goto (I9, S) = I8
3.109
Compiler Construction Syntax Analysis (Parser)
Goto (I9, B) = I9
Goto (I9, A) = I10
Goto (I9, b) = I4
Goto (I10, B)
I18: S → AB ·, a
Goto (I10, a) = I15
Goto (I11, b)
I19: B → aAb ·, a | b
Goto (I12, S)
I20: A → BS · B, b
B → · aAb, b
B → · a, b
Goto (I12, B)
I21: A → BB ·, b
A → B · SB, a
A → B · B, a
S → · AB, a
A → · BSB, a
A → · BB, a
A → · b, a
B → · aAb, a | b
B → · a, a | b
Goto (I12, a) = I17
Goto (I12, b) = I4
Goto (I15, b)
I22: B → aAb ·, $
Goto (I17, A)
I23: B → aA · b, a
Goto (I17, B) = I12
Goto (I17, a) = I5
Goto (I16, A)
I24: B → aA · b, a | b
Goto (I16, B)
I25: A → B · SB, a | b
A → B · B, a | b
3.110
Compiler Construction Syntax Analysis (Parser)
S → · AB, a | b
A → · BSB, a | b
A → · BB, a | b
B → · aAb, a | b
B → · a, a | b
Goto (I19, B)
I26: A → BSB ·, b
Goto (I19, a)
I27: B → a · Ab, b
B → a ·, b
A → · BSB, b
A → · BB, b
A → · b, b
B → · aAb, a | b
B → · a, a | b
Goto (I20, S) = I8
Goto (I20, B) = I9
Goto(I20, A) = I10
Goto (I20, b) = I4
Goto (I20, a) = I5
Goto (I24, b)
I28: B → aAb ·, a | b
No more items can be added.
So construct LR(1) parsing table,
Action Goto
State
a b $ S A B
0 S5 S4 1 2 3
1 accept
2 S7 6
3 S4 8 10 9
4 r4
5 S13/r6 S14/r6
6 r1
7 S15 S13 r6 13 12
8 S15
3.111
Compiler Construction Syntax Analysis (Parser)
9 S5/r6 S4 8 10 9
10 S15 16
11 S19
12 S17 S4 20 21
13
14 r2
15 S15/r6 S22
16 r2
17 S5 12
18 r1
19 r5 r5 26
20 S5 S4 8 10 9
21 r3/S4
22 r5
23 S28
24 S28
25
26 r2
27 S13 S14
28 r5 r5
The above grammar is not LR(1) grammar.
Example 53: Check following grammar is LALR(1) grammar or not?
S → bABc
A → Ad ; | ∈
B → B;C|c
C → a
Solution:
Solution: The agumented grammar is,
S' → S
1. S → bABc
2. A → Ad;
3. A → ∈
4. B → B; C
5. B → C
6. C → a
3.112
Compiler Construction Syntax Analysis (Parser)
3.114
Compiler Construction Syntax Analysis (Parser)
yacc
program YACC y.tab.c
text.y compiler
y.tab.c C a.out
compiler
Fig. 3.45
3.45:
45: Process of Translation Program
The Declaration Section:
ection:
• It includes declaration of tokens along with its values used on the power stack which
declared in lex program. The normal C code declaration part written within % {and
%}.
• Then we declared tokens and start symbol as follows:
% start start symbol
% token name value
• We can use tokens without declaration by enclosing them in single quote as '=', '+'.
• We can define more than one token as % token token1 value1 token2 value2 token3
value3 ……
• We end above section using %% and then define the rule of grammar. These rules are
production rules of grammar and associated with semantic action. We define
production rules of grammar as:
left hand side → production1 | production2 | production3 |
• This will define yacc specification as:
left hand side : production 1 {semantic action1}
| production 2 {semantic action 2}
:
:
| production n {semantic action n}
;
Instead of '→ here: (colon) is used.
• In YACC production if we write single character enclosed within single quote then it is
treated as terminal.
• The string letters and digit which are not enclosed within quotes are treated as non-
terminal symbol.
• One or more production rules are separated by vertical bar and each production rule
ends with semicolon. Always first left-hand side is taken to be start symbol.
3.116
Compiler Construction Syntax Analysis (Parser)
3.117
Compiler Construction Syntax Analysis (Parser)
% non-assoc UMINUS
• Declaring in this way, tells the yacc that "+" and "–" are left associative and at the
lowest precedence level. "*' and "/" are also left associative but precedence is higher
than + and – operators. For unary minus, a pseudo-token UMINUS is used.
• Suppose if we add production rule in the grammar is E ↑ E then, we declare
precedence of it as,
% right '^'
which is after * and / line, since ^ has highest precedence.
• Whenever the yacc finds shift/reduce conflict due to an ambiguous grammar, it
consults the table of precedences and conflict is resolved.
• As we know "–" has low precedence, but we want unary minus to have higher
precedence than multiplication or division.
• So we use % prec UMINUS in the yacc rule specification which tells the yacc to use
precedence for UMINUS (unary minus) not for minus.
Example for YACC specification for grammar of expression:
% {
#include "lex·yy·c"
% }
% start S
% token id 255
% left '–' '+'
% left '*' '/'
% nonassoc UMINUS
%%
S: S {printf ("%d'n", $1);}
;
E : E '+' E {$$ = $1 + $3;}
| E '–' E {$$ = $1 – $3;}
| E ' *' E ($$ = $1 * $3;}
| E '/' E
{
if ($3==0)
yyerror ("divide by zero");
else
$$ = $1/$3
}
3.119
Compiler Construction Syntax Analysis (Parser)
asg : a asg
| a
;
a : ID EQ NUM
| ID EQ ID
;
ID : id
;
NUM : no
;
EQ : eq
;
sta : s
;
sto : sp
;
% %
main()
{
printf ("Enter input \n");
yyparse ( );
}
yyerror()
{
printf ("ERROR")
}
Example
Example 56: Write yacc specification for grammar for while-do statement in Pascal.
S → while cond do begin stats end
Cond → id op idnum
Stats → id: = idnum
Idnum → id | num
Solution :
/* YACC specification */
% {
# include "lex · yy · c"
3.121
Compiler Construction Syntax Analysis (Parser)
5. Precise error indication is not It detects the errors using parse table.
possible.
6. First and follow functions are not FIRST and FOLLOW functions are
required. required.
Difference between Top-
Top-down and Bottom-
Bottom-up Parsing:
Parsing:
Sr. Top-
Top-down Bottom-
Bottom-up
No. Parsing Parsing
1. Top-down parser uses derivation. It uses reduction.
2. Parsing start from starting symbol Parsing start from input string and
and derive the input string. reduce to the starting symbol.
3. Left-recursion and backtracking are No left-recursion and backtracking
two problems occurring in this problems.
parsing.
4. Ambiguous grammars are not It accepts ambiguous grammar.
suitable for top-down parsing.
5. Precise error indication is not Precise error indication is possible.
possible.
6. This uses LL(1) grammar. This uses LR grammar.
7. It scans the input from left-to-right It scans input from left-to-right and
and generates leftmost derivation. generates rightmost deviation.
8. Top-down parsers are Bottom-up parsers are
• Recursive descent parser • Operator precedence parser.
• Predictive parser. • LR parser (SLR, LR (1), LALR)
9. No conflict occur. Shift-reduce and reduce-reduce conflict
occurs.
Difference between LL Parser and LR Parser:
Parser:
Sr.
LL Parser LR Parser
No.
1. The first L in the LL parser is for The L in LR parser is for the left to right,
scanning the input from left to right, and R stands for rightmost derivation in
and the second L is for the leftmost the reverse order.
derivation.
2. LL follows the leftmost derivation. LR follows rightmost derivation in
reverse order.
3. An LL parser amplifies non – Terminals are condensed in LR parser.
terminals.
3.123
Compiler Construction Syntax Analysis (Parser)
PRACTICE QUESTIONS
Q. I Multiple Choice Questions:
1. Which is the process of determining whether a string of tokens can be generated
by a grammar?
(a) lexical analysis (b) syntax analysis
(c) semantic analysis (d) None of the mentioned
2. Parsing takes the token produced by lexical analysis as input and generates, a
parse tree (or syntax tree).
(a) a parse tree (b) a syntax tree
(c) Both (a) and (b) (d) None of the mentioned
3. The program which performs syntax analysis is called as,
(a) parser or syntax analyzer (b) lexical analyzer
(c) semantic analyzer (d) None of the mentioned
4. What does a syntactic analyzer do?
(a) maintain symbol table (b) collect data structure
(c) create syntax/parse tree (d) None of the mentioned
5. Which is basically a sequence of production rules, in order to get the input string?
(a) Parse tree (b) Derivation
(c) Rotation (d) None of the mentioned
3.124
Compiler Construction Syntax Analysis (Parser)
14. Which parser is a top-down parser for a restricted context-free language and
which parses the input from Left to right, performing Leftmost derivation of the
sentence.
(a) LR parser (Left-to-right, Rightmost derivation in reverse)
(b) LL (Left-to-right, Leftmost derivation) parser
(c) Predictive parser
(d) None of the mentioned
15. Which is a non-recursive, shift-reduce, bottom-up parser?
(a) Shift-reduce parsing (b) LL parser
(c) Predictive parser (d) LR parser
16. If A –>β is a production then reducing βto A by the given production is called
handle pruning i.e., removing the children of A from the parse tree. A rightmost
derivation in reverse can be obtained by,
(a) handle pruning (b) predictive pruning
(c) shift-reduce pruning (d) None of the mentioned
17. Which parser is quite efficient at finding the single correct bottom-up parse in a
single left-to-right scan over the input stream, without backtracking?
(a) canonical LR or LR(1) parser (b) LALR (Look-Ahead LR) parser
(c) SLR (Simple LR) parser (d) Performance Testing
18. YACC stands for,
(a) Yet Another Computer – Compiler (b) Yet Another Compiler – Compiler
(c) Yet Another Compilation – Compiler (d) None of the mentioned
Answers
1. (b) 2. (c) 3. (a) 4. (c) 5. (b) 6. (c) 7. (a) 8. (c) 9. (b) 10. (d)
11. (b) 12. (c) 13. (a) 14. (b) 15. (d) 16. (a) 17. (c) 18. (b)
Q. II
II Fill in the Blanks:
1. Syntax analysis checks the _______ structure of the given input, i.e. whether the
given input is in the correct syntax or not.
2. _______ analyzers follow production rules defined by means of context-free
grammar.
3. The program which performs syntax analysis is called as _______ .
4. CLR parsing uses the _______ collection of LR (1) items to construct the CLR (1)
parsing table.
5. If the sentential form of an input is scanned and replaced from left to right, it is
called _______ derivation.
6. A syntax analyzer takes the input from a lexical analyzer in the form of _______
streams.
7. A parse tree depicts associativity and precedence of _______ .
3.126
Compiler Construction Syntax Analysis (Parser)
8. _______ parsers begin with the root and grow the tree toward the leaves.
9. _______ factoring is a grammar transformation which is useful for producing a
grammar suitable for recursive-decent parsing or predictive parsing.
10. If one derivation of a production fails, the syntax analyzer restarts the process
using different rules of same production using _______ .
11. A canonical LR parser or LR(1) parser is an LR(k) parser for k=1, i.e. with a _______
lookahead terminal.
12. A parse tree is a _______ depiction of a derivation.
13. An LL parser accepts _______ grammar.
14. If an operand has operators on both sides (left and right), the side on which the
operator takes this operand is decided by the _______ of those operators.
15. First and Follow sets can provide the _______ position of any terminal in the
derivation.
16. _______ descent parsing suffers from backtracking.
17. The parse tree is constructed by using the _______ grammar of the language and
the input string.
18. _______ parser consists of an input, an output, a stack, a driver program and a
parsing table that has two functions Action and Goto
19. In _______ parser, the input string will be reduced to the starting symbol.
20. The sentential form (string) which matches the RHS of production rule while
reduction, then that string is called _______ .
21. Recursive descent is a _______ parsing technique that constructs the parse tree
from the top and the input is read from left to right.
22. A form of recursive-descent parsing that does _______ require any back-tracking is
known as predictive parsing.
23. The _______ can be produced by handling the rightmost derivation in reverse, i.e.,
from starting symbol to the input string.
24. A CFG is said to _______ grammar if there exists more than one derivation tree for
the given input string i.e., more than one LeftMost Derivation Tree (LMDT)
or RightMost Derivation Tree (RMDT).
25. An operator _______ grammar is a context-free grammar that has the property that
no production has either an empty right hand side (null productions) or two
adjacent non-terminals in its right-hand side.
26. Predictive parsing uses a _______ and a parsing _______ to parse the input and
generate a parse tree.
27. _______ is a utility on UNIX used for parser generator.
28. The process of discovering a handle and reducing it to the appropriate left hand
side is called handle _______ .
3.127
Compiler Construction Syntax Analysis (Parser)
Answers
1. syntactical 2. Syntax 3. parser 4. canonical
5. left-most 6. token 7. operators 8. Top-down
9. Left 10. Backtracking 11. single 12. graphical
13. LL 14. associativity 15. actual 16. Recursive
17. pre-defined 18. LR 19. shift-reduce 20. handle
21. top-down 22. not 23. reduction 24. ambiguous
25. precedence 26. stack, table 27. YACC 28. pruning
Q. III State True or False:
1. Syntax analysis is also known as parsing.
2. The parser analyzes the source code (token stream) against the production rules to
detect any errors in the code.
3. If we scan and replace the input with production rules, from right to left, it is
known as right-most derivation.
4. Leading and Trailing are functions specific to generating an operator-precedence
parser, which is only applicable if you we an operator precedence grammar.
5. An operator precedence parser is a bottom-up parser that interprets an operator-
precedence grammar.
6. YACC assists in the previous phase of the compiler.
7. In left factoring technique, we make one production for each common prefixes and
the rest of the derivation is added by new productions.
8. Bottom-up parsers begin with the leaves and grow the tree toward the root.
9. The LR parser is a shift-reduce, top-down parser.
10. If two different operators share a common operand, the precedence of operators
decides which will take the operand.
11. A grammar becomes left-recursive if it has any non-terminal ‘A’ whose derivation
contains ‘A’ itself as the left-most symbol.
12. Left-recursive grammar is considered to be a problematic situation for top-down
parsers.
13. The LL (1) grammars are suitable for bottom-up parsing.
14. LR parsing tables are a two-dimensional array in which each entry represents an
Action or Goto entry.
15. Handle pruning forms the basis for a bottom-up parsing.
16. LALR parser, parse a text according to a set of production rules specified by
a formal grammar.
17. LR parsers are also known as LR(k) parsers.
3.128
Compiler Construction Syntax Analysis (Parser)
18. A grammar G is said to be ambiguous if it has more than one parse tree (left or
right derivation) for at least one string.
Answers
Answers
1. (T) 2. (T) 3. (T) 4. (T) 5. (T) 6. (F) 7. (T) 8. (T) 9. (F) 10. (T)
11. (T) 12. (T) 13. (F) 14. (T) 15. (T) 16. (T) 17. (T) 18. (T)
Q. IV Answer the following Questions:
(A) Short Answer Questions:
1. What syntax analysis?
2. “Top-down parsing can be implemented using shift-reduce method”. State true or
false.
3. What do LL and LR stand for?
4. List types of parsers.
5. What is the purpose of parser?
6. Define alphabet.
7. What is grammar?
8. Define reduction.
9. What is derivation?
10. Define parse tree.
11. What is meant by ambiguous grammar?
12. Define top-down parser.
13. What does second 'L' stand for in LL(1) Parser?
14. Define backtracking.
15. Give need for left factoring.
16. Define recursive-descent parsing.
17. YACC is a LR parser. Justify.
18. Define predictive parser.
19. Define parse table.
20. What is LL(1) grammar?
21. Define bottom-up parsing.
22. Define Canonical LR parser.
23. Define operator precedence.
24. What is meant by operator precedence grammar?
25. What is leading and trailing.
26. Define precedence function.
27. Construct LR(1) items for the following production S → a.
3.129
Compiler Construction Syntax Analysis (Parser)
24. With the help of steps describe shift-reduce parser. Also state its purpose,
advantages and disadvantages.
25. Give relationship between reduction, handle and handle pruning in detail.
26. With the help of example describe stack implementation of shift-reduce parser.
27. Describe LR parser with its advantages.
28. What is SLR parsing? How it works? Explain diagrammatically.
29. With the help of diagram describe model for LR parser.
30. What is SLR parser? How to work it? Explain diagrammatically.
31. What is CLR parser? How to work it? Explain diagrammatically.
32. Differentiates between SLR, CLR and LALR parsers.
33. What is YACC? Describe in detail.
34. Differentiate between recursive descent parser and predictive parser:
35. Differentiate between top-down and bottom-up parsing:
36. Differentiate between LL parser and LR parser.
37. Construct parsing table and check whether the following grammar is LL(1):
S → BC | AB
A → aAa | b
B → bAa | ∈
C → ∈
38. Check whether the following grammars are SLR (1):
(i) S → AS | a
A → SA | b
(ii) S → Aa | bAc | dc | bda
A → d
(iii) S → bAB | aA
A → Ab | b
B → aB | a
(iv) S → A and A | A or A
A → id | (S)
(v) S → L = R|R
L = * R | id
R → L
39. Check whether the following grammars are LL (1) :
(i) S → ScB | cA
B → bB | ∈
A → AaB | B
3.131
Compiler Construction Syntax Analysis (Parser)
(ii) S → S#
S → aA | b | cB | d
A → aA | b
B → cB | d
(iii) S → aAbCC | Aba
A → BaA | Cb
B → bA | ∈
C → aBb | ∈
(iv) S → abAB | AbC
A → Ba | ∈
B → bA | Aa | ∈
40. Check whether the following grammars are LALR :
(i) S' → S
S → L=R|R
L → * R | id
R → L
(ii) E → E + T| T
T → T* F|F
F → F * | (E) | a | b | #
(iii) S → aAd | bBd | aBe | bAe
A → C
B → C
(iv) S → CC
C → aC | b
41. Check whether the following grammars are LR (1) :
(i) S → A*B|A+B
A → aA | a
B → Ab | b
(ii) S → Aa | bAc | Bc | bBa
A → d
B → c
(iii) S → 0A2
A → 1A1|1
42. Construct recursive descent parser for the following grammars :
(i) S → Ab
A → a|B|∈
B → b
3.132
Compiler Construction Syntax Analysis (Parser)
(ii) E → E+T|T
T → T*F|F
F → (E) | id
(iii) S → aAb | Sa | a
A → Ab | b
(iv) S → Aab | aBb
A → Aa | b
B → bB | b
43. What is left-factoring? How it is used in recursive descent parser?
44. Write a YACC program for simple calculator which performs operations like
23 ** 2 + 15 – 3/5 + 1.
45. Construct an SLR parsing table for the grammar:
E → E sub R | E sup E | {E} | ∈
R → E sup E | E
46. Check following grammar is LALR (1) grammar or not.
E → E+T|T
T → TF | F
F → F * | (E) | a | b | ∈
47. Construct recursive descent parser for the following grammar.
(i) S → abSa | aaAb | b
A → baAb | b
(ii) S → Aab | aBb
A → Aa | b
B → bB | b
48. Check the following grammars are SLR (1).
(i) S → 0A2
A → 1A1 | 1
(ii) S → L=R
S → R
L → * R
L → id
R → L
(iii) S → A|B
A → aA | b
B → dB | b
49. Explain different types of conflicts occurs in LR parser.
3.133
Compiler Construction Syntax Analysis (Parser)
A → aA | b
B → eB | d [5 M]
Ans. Refer to Section 3.4.2.
3. Write a Recursive Descent Parser (RDP) for the following grammar:
A → aAa | Ab | AA | b [5 M]
Ans. Refer to Section 3.3.
4. Check whether the given grammar is SLR(1) or not:
S → bAB | aA
A → Ab | b
B → aB | a [6 M]
Ans. Refer to Section 3.8.2.
5. Check whether the given grammar is LALR(1) or not:
E → E+T|T
T → T*F|F
F → F* | (E) | a | b | # [6 M]
Ans. Refer to Section 3.8.2.
April 2017
5. Construct LALR(1) parsing table and check whether the given grammar is LALR(1)
or not:
S → AaB | B
A → bB | d
B → A|e [6 M]
Ans. Refer to Section 3.8.2.
6. Find out the following grammar is operator precedence grammar or not:
S → a | ^ | (R)
T → S, T | S
R → T
Ans. Refer to Section 3.6.
October 2017
April 2019
3.142
CHAPTER
4
4.0 INTRODUCTION
• The output of Lexical analysis phase is token. Parser checks the syntax of the language.
• Besides specifying the syntax of a language a context-free grammar can be used to
help guide the translation of program.
• A Syntax Directed Translation (SDT) specifies the translation of a construct in terms of
attributes associated with its syntactic components.
• The SDT is a commonly used notation for specifying attributes and semantic rules
along with the context-free grammar.
• A Syntax Directed Definition (SSD) is a generalization of a context-free grammar in
which each grammar symbol has an associated set of attributes, partitioned into two
subsets called the synthesized and inherited attributes of that grammar symbol.
• In this chapter, we introduce a grammar oriented compiling technique known as
syntax-directed translation also introduce semantic of the languages that is type
checking.
• For simplify, we consider the syntax-directed translation of infix expressions to postfix
form and build syntax trees for programming constructs.
• The main idea behind syntax-directed translation is that the semantics or the meaning
of the program is closely tied to its syntax.
• Most of the modern compiled languages exhibit this property. Syntax-directed
translation involves:
1. Identifying attributes of the grammar symbols in the context-free grammar.
2. Specifying semantic rules or attributes equations relating the attributes and
associates them with the productions.
3. Evaluating semantic rules to cause valuable side-effects like insertion of
information into the symbol table, semantic checking, issuing of an error message,
generation of intermediate code, and so on.
• An attribute is any property of a symbol. For example, the data type of a variable is an
attribute.
• A CFG in which a program fragment called output action (semantic action or semantic
rule) is associated with each production is known as Syntax Directed Translation
(SDT).
• In short, SDD given as follows:
Attributes + CFG + Semantic rules = Syntax Directed Definition (SDD).
• The principle of Syntax Directed Translation (SDT) states that the meaning of an input
sentence is related to its syntactic structure, i.e., to its Parse tree, (see Fig. 4.1).
<assign>
States that Related to Parse-Tree
Syntax Directed Input
Translation sentence <targets> := <exp>
id <exp> + id
Fig. 4.1: Concept of SDT
• Syntax-directed translation is done by attaching rules to productions in a grammar.
Consider an expression represented by production rule,
expr → expr1 + term
Here, expr is the sum of two sub-expressions expr1 and term.
• We translate this into pseudo-code as:
translate expr1;
translate term;
handle +;
• Then build a syntax-tree for expr. and then compute the values of attributes at the
node of the tree by visiting the nodes of the tree.
• The above production is written as:
Production Semantic rule
expr → expr1 + term expr ⋅ code = expr1 ⋅ code || term ⋅ code || '+'
4.2
Compiler Construction Syntax Directed Definition
• The semantic rule specifies that the string expr.code is formed by concatenating
expr1.code, term.code and character '+'.
o In syntax-Directed translation grammar, symbols are associated with attributes to
associate information with the programming language constructs that they
represent.
o Values of these attributes are evaluated by the semantic rules associated with the
production rules.
o Evaluation of these semantic rules:
generate intermediate codes.
generates error messages.
put information into the symbol table.
perform type checking.
perform almost any activities above.
o An attribute may hold almost any thing.
a string, a number, a memory location, a complex record, table references.
• Sometimes, translation can be done during parsing. Therefore, we study a class of
syntax directed translations called "L-attribute translations" (L for left-to-right) in
which translation can be performed during passing.
• A smaller class called "S-attributed translations" (S for synthesized, which can
performed with bottom-up parse.)
Form of a Syntax
Syntax-
yntax-Directed Definition (SDD):
(SDD):
• In a syntax-directed definition, each grammar production A → a is associated with a
set of semantic rules of the form b = f(C1, C2, …, Ck), where f is a function and either.
o b is a synthesized attribute of A and C1, C2, …, Ck, are attributes belonging to the
grammar symbols of the production, or
o b is an inherited attribute of one of the grammar symbols on the right side of the
production and C1, C2, … Ck, are attributes belonging to the grammar symbols of the
production.
• In either case, we say that attribute and depends on the attributes C1, C2, … Ck. An
attribute grammar is a syntax directed definition in which the functions in semantic
rules cannot have side effects.
• Syntax-Directed Translation (SDT) is an extension of Context-Free Grammar (CFG)
which acts as a notational framework for the generation of intermediate code.
• A parse tree showing the values of attributes at each node is called an annotated parse
tree. [April 16, 17,
17, 19]
19]
• The process of computing the attributes values at the nodes is called annotating or
decorating of the parse tree.
4.3
Compiler Construction Syntax Directed Definition
• An annotated parse tree for the input string 3 * 5 + 4 n is shown in Fig. 4.2.
L.val = 19
n
E.val = 19
E.val = 15 + T.val = 4
T.val = 15 F.val = 4
T.val = 3
F.val = 5 digit.lexval = 4
*
F.val = 3 digit.lexval = 5
digit.lexval = 3
Fig. 4.2
4.4
Compiler Construction Syntax Directed Definition
o we associate a production rule with a set of semantic actions and we do not say
when they will be evaluated.
2. Translation Schemes:
Schemes:
o indicate the order of evaluation of semantic actions associated with a production
rule.
o in other words translation schemes give or little bit information about
implementation details.
Attribute Grammar:
Grammar: [April 16]
16]
• Attribute grammar is a special form of context-free grammar where some additional
information (attributes) is appended to one or more of its non-terminals in order to
provide context-sensitive information.
• Each attribute has well-defined domain of values, such as integer, float, character,
string, and expressions.
• Attribute grammar is a medium to provide semantics to the context-free grammar and
it can help specify the syntax and semantics of a programming language.
• Attribute grammar (when viewed as a parse-tree) can pass values or information
among the nodes of a tree.
Example,
E → E + T { E.value = E.value + T.value }
• The right part of the CFG contains the semantic rules that specify how the grammar
should be interpreted. Here, the values of non-terminals E and T are added together
and the result is copied to the non-terminal E.
• Semantic attributes may be assigned to their values from their domain at the time of
parsing and evaluated at the time of assignment or conditions.
• Based on the way the attributes get their values, they can be broadly divided into two
categories namely, synthesized attributes and inherited attributes.
2. Inherited Attribute:
ttribute: [Oct. 17]
17]
• An inherited attribute for a non-terminal Y at a parse tree node N is defined by a
semantic rule associated with the production at the parent of N.
• An inherited attribute of node N is attribute values of N's parent, N itself and N's
siblings.
• Terminals can have synthesized attributes, but terminals can not have inherited
attributes.
• For terminal symbols there are no semantic rules for computing the value of an
attribute; they have lexical values supplied by lexical analyzer.
Example 1:
1: Consider an example of grammar of expression. Fig. 4.3 shows the SDD of
the grammar.
L → E
E → E1 + T | T
T → T1 * F | F
F → (E) | digit
Production Semantics Rules
L → E return Print (E.val) or L.val = E⋅val
E → E1 + T E⋅val = E1⋅val + T⋅val
E→ T E⋅val = T⋅val
T → T1 * F T⋅val = T1⋅val * F⋅val
T→ F T⋅val = F⋅val
F → (E) F⋅val = E⋅val
F → digit F⋅val = digit⋅lexval
Fig. 4.3
4.3: SDD of the Grammar
Grammar
o Symbols E, T and F are associated with synthesized attribute val.
o The token digit has a synthesized attribute lexval (it is assumed that it is evaluated
by the lexical analyzer).
o The production L → E return, where return is a endmarker, sets L.val to E.val and
produces the result of the entire expression.
o For production E → E1 + T, the val for the E is the sum of the values of E1 and T (all
its children).
o For production E → T, the val. for the E is same as val at the child for T.
S-attributed SDD:
SDD: [Oct. 18]
18]
• An SDD that involves only synthesized attributes is called S-attributed. The above
(Fig. 4.3) SDD is S-attributed.
4.6
Compiler Construction Syntax Directed Definition
4.7
Compiler Construction Syntax Directed Definition
Fig. 4.4:
4.4: Circular Dependency of Attributes
Attributes
o However, circular dependency attributes can be evaluated by some useful
subclasses of SDD.
Example 3: 3: Consider the grammar of rules of Fig. 4.5 and construct annotated parse
tree for the input string 5 + 3 * 4 return.
This is synthesized attribute SDD.
Fig. 4.5:
4.5: Annotated Parse
Parse Tree for 5 + 3 * 4
Here, each non-terminal attribute val is computed in a bottom-up order i.e. children
attributes are computed first and then apply rule to the parent node. Consider the node for
production F → T * F. The value of attribute T.val is defined by,
Production Semantic Rules
T → T1 * F T ⋅ val = T1 ⋅ val * F⋅val
T' → * FT' | ∈
F → digit
The parse tree for input string 4 * 5 * 3 is,
Fig. 4.6:
4.6: Parse Tree
The Fig. 4.7 shows the SDD based on above grammar suitable for top-down parsing.
Production Semantic Rules
1. E → TE' E'⋅inh = T⋅val
E⋅val = E'⋅syn
' '
2. E' → + TE1 E1⋅inh = E'⋅inh + T⋅val
'
E'⋅syn = E1⋅syn
'
T'⋅syn = T1⋅syn
Fig. 4.7
4.7: SDD with Inherited
Inherited Attributes
o The non-terminals E, T and F have a synthesized attribute val and terminal digit
has a synthesized attribute lexval.
o The non-terminals E' and T' has two attributes an inherited attribute inh and a
synthesized attribute syn.
4.10
Compiler Construction Syntax Directed Definition
'
o Consider rule T' → * FT1 , the head T' inherits the left operand of * in the
production. For example: In a * b * c the root of the subtree for * b * c inherits a.
Then root of subtree for * c inherits a * b value and so on. Fig. 4.8 shows the
annotated parse tree for 4 * 5 * 3.
'
∴ T1⋅inh = 3 * 5 = 15
Fig. 4.10:
4.10: Annotated Parse Tree
Tree with Inherited Attribute for
Input String Real id1, id2, id3
• A dependency graph depicts the flow of information among the attribute in-stances in
a particular parse tree; an edge from one attribute instance to another means that the
value of the first is needed to compute the second. Edges express constraints implied
by the semantic rules.
• It is useful and customary to depict the data flow in a node for a given production rule
by a simple diagram called a dependency graph.
• A dependency graph represents the flow of information between the attribute
instances in a parse tree.
• The inter-dependencies among the attributes at the nodes in a parse tree can be shown
by a dependency graph.
• It is used to depict the inter-dependencies among the inherited and synthesized
attributes at the nodes in a parse tree.
Construction of Dependency Graph:
raph:
1. Put each semantic rule into the form b: f (c1, c2, … ck), by introducing a dummy
synthesized attribute b.
2. The graph has a node for each attribute associated with a grammar symbol and an
edge to the node for b from node for c if attribute b depends on attribute c.
3. For each attribute a of the grammar symbol at n construct a node in the
dependency graph for a.
4. For each semantic rule b: f (c1, c2 … ck) construct an edge from node for ci to the
node for b ∀1< i ≤ k.
• For example, suppose A → XY is a production with semantic rule A ⋅ a: = f (X ⋅ x, Y ⋅ y).
This rule defines a synthesized attribute A ⋅ a that depends on the attribute X ⋅ x and
Y ⋅ y.
• For this production, we have 3 nodes in the dependency graph and edges are:
‡ A ⋅ a depends on X ⋅ x)
1. A ⋅ a from X ⋅ x (‡
‡ A ⋅ a depends on Y ⋅ y).
2. A ⋅ a from Y ⋅ y (‡
• If we consider inherited attribute, then the production A → XY has a semantic rule
X ⋅ i = g (A ⋅ a, Y ⋅ y)
then the edges in the dependency graph are:
• X ⋅ i from A ⋅ a
Here X ⋅ i depends on A ⋅ a and Y ⋅ y
• X ⋅ i from Y ⋅ y
Fig. 4.11:
4.11: The
The Dependency
Dependency Graph
Example 7: Construct dependency graph for input 5 + 3 * 4 for grammar of Fig. 4.12.
Solution::
Solution
Fig. 4.12:
4.12: Dependency Graph for Input
Input 5 + 3 * 4
Example 8: Consider the grammar:
D → TL
T → int | real
L → L, id | id
Construct dependency graph.
Solution::
Solution
Fig. 4.13:
4.13: Dependency Graph
Graph
4.14
Compiler Construction Syntax Directed Definition
3. T → ∈ T'⋅syn = T'⋅inh
4. F → digit F⋅val = digit⋅lexval
The dependency graph is shown in Fig. 4.14 for input string digit * digit.
Fig. 4.14:
4.14: The Dependency
Dependency Graph for Input String Digit * Digit
4.15
Compiler Construction Syntax Directed Definition
Here, in the first production rule, inherited attribute T' ⋅ inh is define using only F⋅val
and F appears as the left of T' in the production rule. In the second production rule,
" '
T1⋅inh is define using T'⋅inh attribute and F⋅val⋅, F appears as the left of T1 in the
production rule.
Example 11:
11: Consider the grammar with production
A → XY.
Solution:
Solution:
Production Semantic rule
A → XY A⋅syn = X⋅val
X⋅inh = f (Y⋅val, A⋅syn)
The first semantic rule A⋅syn = X⋅val, is either S-attributed or L-attributed SDD. It
defines synthesized attribute A⋅syn in terms of attribute an child. (X contain in
production body).
The second rule defines an inherited attribute X⋅inh, so SDD cannot be S-attributed.
Also SDD cannot be L-attributed because the attribute Y⋅val is used to help X⋅inh and Y
is the right of X in the production body.
Any SDD containing the above production is neither S-attributed nor L-attributed.
4.17
Compiler Construction Syntax Directed Definition
• L-Attributed Definition
Definition:
finition: A SDD is L-attributed if each inherited attribute of Xi,
1 ≤ i ≤ n, on the right side of A → X1, X2, …, Xn, depends only on,
1. the attributes of the symbols X1, X2, …, Xi–1 to the left of Xi in the production rule
and
2. the inherited attributes of A.
Note:
Note: Every S-attributed definition is L-attributed because rule 1 and 2 apply only to
inherited attributes.
Example 12
12: Consider the following SDD.
Production Semantic rules
1. A → LM L⋅inh = l (A⋅inh)
M⋅inh = m (L⋅syn)
A⋅syn = f (M⋅syn)
2. A → QR R ⋅ inh = r (A⋅inh)
Q ⋅ inh = q (R⋅syn)
A ⋅ syn = f (Q⋅syn)
In this SDD the inherited attribute Q ⋅ inh of the grammar symbol Q depends on its
right side. Therefore, this SDD is not L-attributed.
Example 13:
13: Consider the following SDD:
Production Semantic
Semantic Rules
Rules
1. D → TL L⋅inh = T⋅type
2. T → int T⋅type = integer
3. T → real T⋅type = real
4. L → L1, id L1⋅inh = L⋅inh
addtype (id⋅entry, L⋅inh)
5. L → id addtype (id⋅entry, L⋅inh)
This SDD is for type declaration.
For example, int a, b, c
real x, y.
o T⋅type is a attribute of T which is the type in the declaration D.
o L has inherited attribute, to pass the declare type down the list of identifier.
o Production 2 and 3 are having synthesized attribute T⋅type for integer or real
value.
o This type is passed to L⋅inh of production 1.
o In production 4, the value of L1⋅inh is computed by copying the value of L⋅inh from
the parent of that node.
o The function addtype is called with 2 arguments id⋅entry, which is lexical value
point to the symbol table and L⋅inh is the type assign to every identifier.
The dependency graph is shown in Fig. 4.15 for input real id1, id2, id3.
Fig. 4.15
4.15: Dependency for Real
Real id1, id2, id3
4.19
Compiler Construction Syntax Directed Definition
o In this SDD, notes 6, 8 and 10 are the dummy attributes represents the function
addtype to evaluate a type.
o For each identifier on the list, the type is entered into the symbol table entry for
the identifier. So entering the type for one identifier does not affect the symbol
table entry for any other identifier.
So entries are updated in any order, which controls the side effects.
Example 14:
14: Construct annotated parse tree for the expressions, int a, b, c for the SDD.
Solution:
Solution:
Fig. 4.16
4.16: Annotated Parse
Parse Tree
Fig. 4.17:
4.17: Syntax Tree
Tree
4.20
Compiler Construction Syntax Directed Definition
• In syntax tree, operators and keywords do not appear as leaves, they are interior
nodes.
• Syntax-directed translation can be based on syntax trees. The syntax tree for
expression, 3 + 4 * 5 is shown in Fig. 4.18.
Fig. 4.18:
4.18: Syntax Trees for Expressions
Expressions
• The construction of syntax tree for an expression is similar to the translation of an
expression into postfix form.
• By creating a node for each operator and operand, construct the subtrees. Each node
in a syntax tree is a record of several fields.
• If the node is operator, then the fields are: operator and pointers to the node for the
operands. The operator is the label of the node.
• During translation syntax tree nodes may have additional values of attributes attached
to the node.
• The following are the functions used to create the nodes of binary tree for expression:
1. New code (op, left, right)
right):
ht): It creates an operator node with label op and two
pointers left and right.
2. New leaf (id, entry):
entry): It creates an identifier node with label id and a field
containing entry, a pointer to symbol-table for identifier.
3. New leaf (num, val):
val): It creates a number node with label num and a field
containing val value of the number.
Example 15:
15: Consider a sequence of following function calls to create a syntax tree for
expression a + 5 – b.
Solution:
Solution: Let P1, P2, …, P5 are pointers to nodes and entrya and entryb are pointers to
the symbol-table for identifiers a and b respectively.
1. P1 = New leaf (id, entrya);
2. P2 = New leaf (num, 4);
3. P3 = New node ('+', P1, P2); Steps in construction of syntax tree
4. P4 = New leaf (id, entryb);
5. P5 = New node ('–', P3, P4).
The tree is constructed as bottom-up. The function calls first and second constructs a
leaves for a and 5 and the pointers are saved using P1 and P2. The call new node
('+', P1, P2) construct interior node with a and 5 are the children.
The S-attributed definition of Fig. 4.19. Constructs a syntax free for simple expression
containing operators + and –.
4.21
Compiler Construction Syntax Directed Definition
Fig. 4.20
4.20:
20: Construction of the Syntax Tree
Tree for Expression
xpression a + 5 – b
4.22
Compiler Construction Syntax Directed Definition
Example 16: 16: Consider L-attributed definition of Fig. 4.21, which uses top-down
parsing. The grammar is non-left-recursive. Consider the expression a + 5 – b and
construct the syntax tree using definition of Fig. 4.21.
Production Semantic Rules
1. E → TE' E⋅nod = E' ⋅ syn
E'⋅inh = T ⋅ node
E'⋅syn = E'1⋅syn
E'⋅syn = E'1⋅syn
Fig. 4.22:
4.22: Dependency Graph
Graph for a + 5 – b with SDD
SDD of Fig. 4.21
4.21
4.23
Compiler Construction Syntax Directed Definition
Fig. 4.23
4.23: Tree for int [2] [4]
SDD for this type expression is shown in Fig. 4.24.
Production Semantic rules
1. D → AB D⋅type = B⋅type
B⋅a = A⋅type
2. A → int A⋅type = integer
3. A → float A⋅type = float
4. B → [num] B1 B⋅type = array (num⋅val, B1⋅type)
B1⋅a = B⋅a
5. B → ∈ B⋅type = B⋅a
Fig. 4.24
4.24: SDD for Array
Array Type
o D generates basic data type or an array type.
o A generates either int type or float type.
o If derivation is,
D ⇒ AB ⇒ int B ⇒ int
D ⇒ AB ⇒ float B ⇒ float.
then D generates basic type⋅(B derives ∈ ).
4.24
Compiler Construction Syntax Directed Definition
o If derivation is,
D ⇒ AB ⇒ int [num] B1
⇒ int [num] [numb] B1
⇒ int [num] [num]
then D generates array type.
o The non-terminals D and A have a synthesized attribute type. The non-terminal B
has two attributes: inherited attribute a and synthesized attribute type. Inherited
attribute is used to pass the attribute value a down the tree.
o The non-terminal B inherit type from A.
o The Fig. 4.25 shows the annotated parse tree for input string int [2] [4].
Fig. 4.25
4.25: Annotated Parse Tree
Tree of Array Type
4.5.1 Definition
• SDT is context free grammar with program fragments embedded within production
bodies; where program fragment is called semantic actions.
• A translator for an arbitrary SDD can be difficult to build. However, there are large
classes of SDD for which translators are constructed.
• There are two classes of SDD's to construct translators:
1. S-attributed (LR-parsable)
2. L-attributed (LL-parsable).
• In this section we discuss such one class, the S-attributed definition. Syntheiszed
attributes can be evaluated by bottom-up parser.
• The parser keeps the values of S-attributes associated with the grammar symbols on its
stack, which we will discuss in this section.
4.25
Compiler Construction Syntax Directed Definition
Fig. 4.27
4.27: Annotated Parse Tree
Tree for (5 + 3 * 4)
These actions can be correctly performed along with reduction steps of the parser.
4.26
Compiler Construction Syntax Directed Definition
PRACTICE QUESTIONS
Q. I Multiple Choice Questions:
1. Which refers to a method of compiler implementation where the source language
translation is completely driven by the parser.
(a) syntax-directed translation (b) compiler-directed translation
(c) code-directed translation (d) None of the mentioned
2. Syntax directed translation can be based on,
(a) syntax tree (b) parse tree
(c) syntax tree as well as parse tree (d) None of the mentioned
3. Which is refers are useful tool for determining an evaluation order for the
attribute instances in given parse tree?
(a) wait-for-graph (b) dependency graph
(c) sparse-tee graph (d) None of the mentioned
4. Which specifies the values of attributes by associating semantic rules with the
grammar productions?
(a) syntax-directed translation (b) compiler-directed translation
(c) code-directed definition (d) Syntax Directed Definition (SDD)
5. If an attribute is an attribute of the non-terminal on the left-hand side of a
production called an,
(a) Inherited attribute (b) Synthesized attribute
(c) Both (a) and (b) (d) None of the mentioned
6. An attribute of a non-terminal on the right-hand side of a production is called an,
(a) Inherited (I attribute) (b) Synthesized (S attribute)
(c) Both (a) and (b) (d) None of the mentioned
7. Which is a special form of context-free grammar where some additional
information (attributes) is appended to one or more of its non-terminals in order
to provide context-sensitive information?
(a) Regular Grammar (b) Context-Free Grammar
(c) Operator Grammar (d) Attribute Grammar
8. A parse-tree, with values of its attributes at each node is called as,
(a) annotated parse tree (b) annotated compiler tree
(c) annotated dependency graph tree (d) None of the mentioned
9. The methods for ordering the evaluation of attributes includes,
(a) Oblivious method (b) Parse-tree (topological sort) method
(c) Rule-based method (d) All of the mentioned
4.27
Compiler Construction Syntax Directed Definition
10. SDT's with all actions at the right ends of the production bodies are called as,
(a) prefix SDT (b) postfix SDT
(c) parserfix SDT (d) None of the mentioned
11. Which of the following error is expected to recognize by semantic analyzer?
(a) Type mismatch (b) Undeclared variable
(c) Reserved identifier misuse. (d) All of the mentioned
12. In a bottom-up evaluation of a syntax directed definition, inherited attributes can
be,
(a) always be evaluated
(b) evaluated only if the definition is L-attributed
(c) be evaluated only if the definition has synthesized attributes
(d) never be evaluated
13. What is true about Syntax Directed Definitions?
(a) Syntax Directed Definitions + Semantic rules = CFG
(b) Syntax Directed Definitions + CFG = Semantic rules
(c) CFG + Semantic rules = Syntax Directed Definitions
(d) None of the mentioned
Answers
1. (a) 2. (c) 3. (b) 4. (d) 5. (b) 6. (a) 7. (d) 8. (a) 9. (d) 10. (b)
11. (d) 12. (b) 13 (c)
Q. II
II Fill in the Blanks:
1. Syntax Directed Definition (SDD) is a _______ grammar with attributes and rules
together which are associated with grammar symbols and productions
respectively.
2. _______ is a medium to provide semantics to the context-free grammar and it can
help specify the syntax and semantics of a programming language.
3. _______ of a language provide meaning to its constructs, like tokens and syntax
structure.
4. Syntax directed definition that involves only _______ attributes is called S-
attributed.
5. Syntax trees are _______ top-down and left to right.
6. Attributes of _______ definitions may either be synthesized or inherited.
7. If an SDT uses only synthesized attributes, it is called as S-attributed _______ . These
attributes are evaluated using S-attributed SDTs that have their semantic actions
written after the production (right hand side).
8. Synthesized attributes represent information that is _______ passed up the parse
tree.
4.28
Compiler Construction Syntax Directed Definition
9. Syntax Directed Translation (SDT) are _______ rules to the grammar that facilitate
semantic analysis.
10. Semantic analyzer receives AST (Abstract Syntax Tree) from its _______ stage
(syntax analysis).
Answers
1. context-free 2. Attribute grammar 3. Semantics 4. synthesized
5. parsed 6. L-attributed 7. SDT 8. being
9. augmented 10. previous
Q. III State True or False:
1. The main idea behind SDD is that the semantics or the meaning of the program is
closely tied to its syntax.
2. The attributes of an S-attributed SDD can be evaluated in bottom up order of
nodes of the parse tree.
3. The syntax directed definition in which the edges of dependency graph for the
attributes in production body, can go from left to right and not from right to left is
called L-attributed definitions.
4. The general approach to Syntax-Directed Translation is to construct a parse tree or
syntax tree and compute the values of attributes at the nodes of the tree by visiting
them in some order.
5. The inherited attribute can take value either from its child or from its siblings.
6. Attribute grammar is a medium to provide semantics to the context-free grammar
Semantics help interpret symbols, their types, and their relations with each other.
7. L-attributed SDT is the form of SDT uses both synthesized and inherited attributes
with restriction of not taking values from right siblings. In L-attributed SDTs, a
non-terminal can get values from its parent, child, and sibling nodes.
8. Some of the semantics errors that the semantic analyzer is expected to recognize
include Accessing an out of scope variable, Type mismatch, Undeclared variable,
Reserved identifier misuse and so on.
9. A dependency graph represents the flow of information between the attribute
instances in a parse tree.
10. Semantic analyzer attaches attribute information with AST, which are called
Attributed AST.
Answers
1. (T) 2. (T) 3. (T) 4. (T) 5. (F) 6. (T) 7. (T) 8. (T) 9. (F) 10. (T)
Q. IV Answer the following Questions:
(A) Short Answer Questions:
1. Define SDD.
2. What is SDT?
4.29
Compiler Construction Syntax Directed Definition
2. Define annotated parse tree. For the input expression 3*5+4n, draw an annotated
parse tree using the following SDD:
4.32
Compiler Construction Syntax Directed Definition
1. Terminals can have synthesized attributes, but not inherited attributes. State true
or false. [1 M]
Ans. Refer to Section 4.2.1.
2. Define SDD. (Syntax Directed Definitions). [1 M]
Ans. Refer to Section 4.2.
3. Consider the following SDD and find the dependency graph for the
expression, 7 * 5:
Production Rules Semantic Rules
S → AB B.inh = A.val
S.val = B.syn
B → *ABl Bl.inh = B.inh * A.val
B.syn = Bl.syn
4.33
Compiler Construction Syntax Directed Definition
B → C B.syn = B.inh
A → digit A.val = digit.lexval [5 M]
Ans. Refer to Section 4.2.2.
4. Write the steps to construct syntax tree using the semantic rules. Construct a
syntax tree for the following SDD:
Production Rules Semantic Rules
E → E1 + T E.node = new Node ('+", E1.node, T.node)
E → E1 – T E.node = new Node ('–', E1.node, T.node)
E → T E.node = T.node
T → (E) T.node = E.node
T → id T.node = new Leaf (id, id.entry)
T → num T.node = new Leaf (num, num.val) [4 M]
Ans. Refer to Section 4.2.2.
October 2018
1. State True or False. An SDD is S-attributed if every attribute is synthesized. [1 M]
Ans. Refer to Section 4.2.1.
2. Write a short not on SDD (Syntax Directed Definitions). [4 M]
Ans. Refer to Section 4.2.
April 2019
5.0 INTRODUCTION
• Code generation is the process by which a compiler's code generator converts
some intermediate representation of source code into a form (e.g., machine code) that
can be readily executed by a machine.
• The code generator takes an intermediate representation of the source program as
input and produces an equivalent target program as output.
• Optimization means making the code shorter/small and less complex, so that it can
execute faster and takes lesser memory space.
• In process of translation of source language into target language compiler constructs a
sequence of Intermediate Representation (IR).
• The Fig. 5.1 shows the back end of compilation phases where intermediate code is
generated and code optimization takes place.
Source Front Intermediate Code Intermediate Code Target
Program end representation optimization representation generator code
• Compiler needs to produce efficient target programs, so optimizer maps the IR into IR
from which more efficient code can be generated.
• In this chapter, we will discuss code optimization and code generation phases of the
compiler.
• If the number of results exceeds than the number of available registers, then the
results are moved to memory.
• The following issues can be handled by using register descriptor:
1. How to move the results between memory and CPU registers.
2. How to know which partial result is contained in a register.
• Register descriptor is used to maintain the register status.
• In this section we will discuss operand descriptors and register descriptors in detail.
;
T: T * F {$ $ = codegen ('*', $ 1, $ 3)}
| F { $ $ = $ 1}
F: id { $ $ = getreg ($ 1)}
;
% %
getreg (operand)
{
i = i + 1;
operand_descr [i] = ((type), (addressability_ code, address))
return i;
}
Fig. 5.3:
.3: Code Generator
Generator
The getreg returns the location L to hold the value of id in the assignment statement.
When an operator is reduced by the parser, the function 'codegen' is called with that
operand and descriptors of its operands as parameters.
Example 2:
2: Consider the expression a * b + c. The code generated for this expression
is:
MOVER AREG, A
MULT AREG, B
MOVER AREG, TEMP
ADD AREG, C
Five operand descriptors are used during code generation. Assume a, b and c are
integers of size 1 memory word.
The operand descriptors are as follows:
1. (int, 1) M, addr(a) Descriptor for a
2. (int, 1) M, addr(b) Descriptor for b
3. (int, 1) R, addr (AREG) Descriptor for a * b
4. (int, 1) M, addr (C) Descriptor for c
5. (int, 1) R, addr (AREG) Descriptor for temp + c
Operand Descriptors
Descriptors Operand De
Descriptors
1 (int, 1) M, addr(a) 1 (int, 1) M, addr(a)
2 (int, 1) M, addr(b) 2 (int, 1) M, addr(b)
3 (int, 1) R, addr(AREG) 3 (int, 1) R, addr(temp[1])
4 (int, 1) M, addr(c) 4 (int, 1) M, addr(c)
5 (int, 1) M, addr(d) 5 (int, 1) M, addr(d)
6 (int, 1) R, addr(AREG)
Register Descriptor
Descriptor Register Descriptor
Descriptor
Occupied #3 Occupied #6
(a) (b)
Fig. 5.4: Operand and Register Descriptors
o Here, Register descriptor #3 is used to indicate that the value of the operand
described by operand descriptor #3 (after a * b) has been moved to memory
location temp1. The register descriptor #6 which describe the partial result,
c * d; since the operand descriptor #6, it is defined as a partial result.
Example 5:
5: Consider a statement, a * b + c * d * (e + f) + c * d. Show the contents of
operand descriptors and register descriptor after complete code generation.
Solution:
Solution: The code generated is:
MOVER AREG, A
MULT AREG, B
MOVEM AREG, TEMP1
MOVER AREG, C
MULT AREG, D
MOVEM AREG, TEMP2
MOVER AREG, E
ADD AREG, F
MULT AREG, TEMP2
ADD AREG, TEMP1
MOVEM AREG, TEMP1
MOVER AREG, C
MULT AREG, D
ADD AREG, TEMP1
5.6
Compiler Construction Code Generation and Optimization
Operand
Operand Descriptors
Descriptors
1. (int, 1) M, addr (a)
2. (int, 1) M, addr (b)
3. (int, 1) M, addr (temp[1])
4. (int, 1) M, addr (c)
5. (int, 1) M, addr (d)
6. (int, 1) M, addr (temp[2])
7. (int, 1) M, addr (e)
8. (int, 1) M, addr (f)
9. (int, 1) M, addr (temp[2])
10. (int, 1) M, addr (temp[1])
11. (int, 1) M, addr (temp[1])
12. (int, 1) M, addr (c)
13. (int, 1) M, addr (d)
14. (int, 1) M, addr (temp[1])
Register Descriptor
Occupied # 14
5.7
Compiler Construction Code Generation and Optimization
<operand> → <operand><operand> +
<operand> → <operand><operand> *
<operand> → var
where, var denotes any variable.
• Consider grammar of expressions:
E → E + E | E * E | id
Operator precedence matrix is as shown below:
RHS
Symbol
id + *
LHS
Symbol
id =· ·> ·>
+ <· ·> <·
* <· ·> ·>
Procedure:
Procedure: To parse a sentence by using operator precedence symbol:
Step 1:
1: Put <· at the left end of the input and put ·> at the right end of input.
Step 2:
2: Remove all non-terminals from input.
Step 3:
3: Put precedence operator between every 2 terminals.
Step 4:
4: Reduce the innermost sentence enclosed between <· and ·> .
Step 5:
5: Continue till no more terminals are present.
• To implement this procedure for compilation of expression a stack is used. Symbols
are shifted till ·> is found, and are removed from stack till <· is found and reduce the
string. This is continued till the entire sentence is parsed. Here, we will discuss how
expression string is parse and converted into postfix notation.
• Consider source string in infix form is,
|− a + b * c + d * e ↑ f –|
2 1 5 4 3 … … (5.1)
TOS Code :
* c
LOAD b
AR p + b
a } local
data
MULT c
ADD a
STORE TEMP
Fig. 5.5
5.5:
.5: Stack of Partial Implementation
Implementation of AR
A Rp
• This partial result is stored in TEMP i.e. temporaries in ARp (activation recorde of
postfix expression) and perform other evaluation until no more terminals are present.
• The postfix form of intermediate code is,
|− a b c * + d e f ↑ * + −|
1 2 3 4 5
• The instruction of the Triples presentation is divided into three fields - op, arg1 and
arg2. The fields arg1 and arg2 for the arguments of op (operator) are either pointers to
the symbol table or pointers into the triple structure. Since three fields are used, the
intermediate code format is known as triples.
• A quadruple is a record structure with four fields which we call op, arg 1, arg2 and
result. The op field contains an internal code for the operator. The three-address
statement x = y op z is represented by placing y in arg1, z in arg2 and x in result.
• Statements with unary operators like x = -y or x = y do not use arg2. Operators like
param (Parameter Operator) use neither arg2 nor result. Conditional and
unconditional jumps put the target label in result.
• The contents of fields arg1, arg2 and result are normally pointers to the symbol table
entries for the names represented by these fields.
1. Triples:
Triples: [Oct. 16,
16, 17,
17, April 19]
19]
• The triples have three fields to implement the three address code. The field of triples
contains the name of the operator, the first source operand and the second source
operand.
• A triple is a representation of an elementary operation in the following tabular form:
Operator Operand 1 Operand 2
• To avoid entering temporary names into the symbol table, we might refer to a
temporary value by the position of the statement that computes it. So, we use only
three field to represent a statement.
5.9
Compiler Construction Code Generation and Optimization
• Since, three fields are used, the intermediate code format is known as triples. Every
triple has its own number.
• Each operand of a triple is either a variable or constant or the result of some
evaluation represented by another triple.
• If the result is used as operand, then the next evaluation if the result is same, the
operand field contains that triple's number.
Example 6:
6: Consider the expression a + (b * c) + d ↑ e.
The postfix form is abc * + de ↑
The Fig. 5.6 shows the triples for the above expression.
Operator Operand 1 Operand 2
1 * b c
2 + 1 a
3 ↑ e f
4 * d 3
5 + 2 4
Fig. 5.6:
.6: Triples for Stri
String
tring |−a + b ⋅ c + d * c ↑ f −|
Example 7:
7: Consider the expression, a = b * − c + b * − c.
The triple * representation is,
Triple No.
No. Operator Operand 1 Operand 2
1 uminus c −
2 * b 1
3 uminus c −
4 * b 3
5 + 2 4
6 assign a 5
Example 8: Show the triple representation of x = y[i].
The triple representation is,
Operator Operand 1 Operand 2
1 =[] y i
2 assign x 1
A hash organization can be used for the table of triples. Triples are useful in code
optimization.
5.10
Compiler Construction Code Generation and Optimization
2. Indirect Triples:
Triples:
• To eliminate the common sub-expressions which are used more than one place in a
program, indirect triples is useful. Indirect triples is useful in optimizing compilers.
• A program statement is represented as a list of triple numbers. While processing a new
expression, the occurrences of identical expressions are detected and the triple
number is searched from the triple table for that expression and then statement table
is formed.
• The indirect triples representation saves the memory (storage economy). It is also used
in certain forms of optimization called common subexpression elimination.
Example
Example 9:
9: Fig. 5.7 shows the indirect triples representation for program segment,
z = a + b * c + d * e ↑ f;
y = x+b*c
Use of b * c in both statements is reflected by the fact that triple number 1 appears in
the list of triples for both statements.
Operator Operand 1 Operand 2
1. * b c
2. + 1 a Stmt. No.
No. Triple Nos.
Nos.
3. ↑ e f 1 1, 2, 3, 4, 6
4. ∗ d 3 2 1, 6
6. + x 1
(a) Triple's Table
Table
Fig. 5.7
5.7:
.7: Indirect Triples
Triples
Example 10:
10: Consider the expression, a = b * − c + b * − c.
Indirect triples representation is:
Operator Operand 1 Operand 2
1. unimus c −
2. * b 1 Stmt. No.
No. Triple Nos.
Nos.
3. + 2 2 1 1, 2, 3, 4
4. assign a 3
(a) Triple's Table
Table (b) Statement Table
Table
Fig. 5.8
5.11
Compiler Construction Code Generation and Optimization
3. Quadruples:
Quadruples: [Oct. 16,
16, 17, April 17,
17, 19]
19]
• A quadruple is a record structure with four fields:
Operator Operand 1 Operand 2 Result name
• Here, result name is the result of the evaluation. It can be used as the operand of
another quadruple.
• When an expression is to be moved from one part of a program to another part during
program optimization, then triples are not suitable because triple numbers would
change. Hence, quadruples are more convenient than using a number to designate a
subexpression.
Example 11:
11: Fig. 5.9 shows the quadruple of expression String 5.1.
Operator Operand 1 Operand 2 Result name
1. * b c t1
2. + t1 a t2
3. ↑ e f t3
4. ∗ d t3 t4
5. + t2 tt t5
Fig. 5.9
5.9:
.9: Quadruples
Example 12:
12: Consider the expression,
a = b * − c + b * − c
The quadruples representation is:
Operator Operand 1 Operand 2 Result name
1. uminus c t1
2. * b t1 t2
3. uminus c t1
4. b t3 t4
5. + t2 t4 t5
6. := t5 a
Fig. 5.10:
.10: Representation of Quadruples
Quadruples
• The contents of fields operand1, operand2 and result are normally pointers to the
symbol table entries for the names represented by these fields.
• If so, temporary names must be entered into the symbol table as they are created and
location is easily accessed via the symbol table.
Comparison of Quadruples and Triplet:
riplet:
(i) Quadruples are more useful in an optimizing compiler, where statements are often
moved around. Triplets are not much more suitable in optimizing compiler.
5.12
Compiler Construction Code Generation and Optimization
(ii) When an expression is to be moved from one part of a program to another part
during program optimization, triple numbers would change. In quadruples, if we
move a statement computing x, the statement using x requires no change.
(iii) Indirect triples look very much like quadruples as far as their utility is concerned.
They use same amount of space and they are equally efficient for recording of
code.
(iv) Indirect triples can save some space compared with quadruples if the same
temporary value is used more than once.
4. Expression Tree
Tree:
ree: [April 17]
17]
• Expression tree is an important form of optimization which saves number of machine
instruction while code generation.
• Expression tree as name suggests is nothing but expressions arranged in a tree-like
structure in which internal node corresponds to the operator and each leaf node
corresponds to the operand.
• For example: Expression is (A + B) / (C – D). The expression tree is shown in Fig. 5.11.
+ –
A B C D
Fig. 5.11:
.11: The Expression Tree
Tree
• The code generated is as follows:
Left
Left-right Code
Code Right-
Right-to-
to-left Code
MOVER AREG, A MOVER AREG, C
ADD AREG, B SUB AREG, D
MOVEM AREG, TEMP 1 MOVEM AREG, TEMP 1
MOVER AREG, C MOVER AREG, A
SUB AREG, D ADD AREG, B
MOVEM AREG, TEMP 2 DIV AREG, TEMP1
MOVER AREG, TEMP1
DIV AREG, TEMP 2
• Instead of always generating code from left-to-right, the right subtree is evaluated
before the left-subtree, if the register requirements of both subtree of an operation are
identical.
• The goals of optimization are the reduction of execution time and the improvement in
memory usage.
• Efficiency is achieved by:
1. Eliminating the redundancies in a program.
2. Rearranging or rewriting computations in a program.
3. Using appropriate code generation strategies.
• The Fig. 5.12 shows the optimizing compiler. The code optimization depends upon the
intermediate representation of the source code.
• The front end generates the intermediate code which consists of triplet and
quadruples.
• To improve the efficiency of program, an optimizing transformation is needed. The
Transformed Intermediate Code (IR) is input to the code generation phase.
Source program
Front End
Optimization
phase
Code
generation
Target
code
Fig. 5.12
5.12:
.12: Optimizing Compiler
Compiler Schematic
• The optimization techniques are independent of both the programming language and
the target machine.
Advantages of Code Optimization:
ptimization:
1. The optimized program occupied 25 per cent less storage and execute three times
faster than un-optimized program.
2. Reduces cost of execution.
Disadvantage:
Disadvantage:
1. The 40% extra compilation time is needed.
5.14
Compiler Construction Code Generation and Optimization
……….. ………
b = x * y + 10 ………
b = t + 10
• Here, subexpressions contain two occurrences of x * y. The second occurrence of x * y
can be eliminated because the first occurrence of x * y is always evaluated before the
second occurrence, during execution. The first result of x * y is saved in t and this
value is used in assignment of b.
• For example,
t1:= 4 * i t1 += 4 * i
x:= a[t1] x:= a[t1]
t2:= 4 * i t2:= t1
t3:= 4 * j ⇒ t3: 4 * j
t4:= a[t3] t4:= a[t3]
t5:= 4 * j t5:= t3
a[t5]:= x a[t5]:= x
• Here, the assignments to t2 and t5 have the common subexpressions 4 * i and 4 * j
respectively, they have been eliminated by using t1 instead of t2 and t3 instead of t5.
Implementation:
Implementation:
1. Expression which results in the same value are identified.
2. These expressions are easily identified using triples and quadruples.
3. Equivalence of expression is determined by considering whether their operands
have the same values in all occurrences.
4. If subexpressions have same value then the expression can be eliminated.
Use of algebraic equivalence improves the effectiveness of optimization but
increases the cost of optimization.
5.3.1.3 Dead Code Elimination [April 18]
18]
• Dead code is the code which can be omitted from a program without affecting the
results.
• Dead code is detected by checking whether the value assigned in an assignment
statement is used anywhere in the program.
• Dead is nothing but the useless code, statements that compute values that never get
used.
• Example, consider the following code segment:
{……
….
a=10;
5.16
Compiler Construction Code Generation and Optimization
if (a==10)
{
x++;
printf(“%d”, x);
}
else
{
y++;
printf(“in dead code”)
}
}
• The segment contains dead code if the value assign to a is not used in the program, no
matter how control flows affecting executing this assignment.
5.3.1.4 Frequency Reduction
• Code from the high execution frequency region of the program can be moved to low
execution frequency region. This reduces the execution time.
• Consider the following code segment:
for (i = 0; i < 10; i ++)
{
a: = a + i * b;
b: = b + i;
c: = x + 10;
d: = d * c + b;
}
• In this loop, value of 'c' remains same as every time loop is executed. So c can be
calculated outside the loop and the code is rewritten as
c: = x + 10,
for (i = 1; i < 10; i ++)
{
a: = a + i * b;
b: = b + i;
d: = d * c + b;
}
5.17
Compiler Construction Code Generation and Optimization
redundant, or would have been already eliminated locally. Consider the global
optimization subexpression elimination.
• If some expression x * y occurs in block A, and also in block B, then the subexpression
in block B is eliminated with two conditions:
1. Block B is executed only after block A.
2. No assignment of x or y have been executed after the last evaluation of x * y in
block A.
• The optimization is done by saving the value of x * y in a temporary location in all
blocks; which satisfies condition 1.
5.4.1 DAG for Expressions [Oct. 16, 17, 18, April 17,
17, 18,
18, 19]
19]
• A Directed Acyclic Graph (DAG) for an expression identifies the common sub-
expressions in the expression.
• DAG is constructed similar to syntax tree. A DAG has a node for every sub-expression
of an expression; an interior node represents an operator and its children represent its
operands.
• The difference between syntax tree and DAG is a node N in a DAG has more than one
parent if N represents a common sub-expression.
• In syntax tree for each expression, the tree is drawn and for common expression the
sub-trees are duplicated as many times that common sub-expression occurs.
• DAG generates more efficient intermediate code. Example, consider an expression
x + x * (y – z) + (y – z) * a.
• The DAG representation of this expression is shown in Fig. 5.13.
Fig. 5.13
5.13:
.13: DAG for Expression
Expression x + x * (y – z) + (y – z) * a
• In this expression x appears twice, x has two parents because x is common to the two
sub-expressions x and x * (y – z).
• The common sub-expression (y – z) occurs twice, it represent by the same node (–),
which has two parents.
• The syntax directed definition of expression is shown in Fig. 5.4 used to construct a
DAG or a syntax tree.
• The syntax tree for expressions x – y + a or x * y + a also be constructed from Fig. 5.14,
where the functions Leaf and Node created a fresh node each time they were called;
even if common sub-expression is present.
• DAG is created if before creating a new node, Leaf and Node functions first check
whether an identical node already exists. If exist, Node returns the existing node,
otherwise it creates a new node.
• The sequence of steps used to construct the DAG in Fig. 5.13 is shown in Fig. 5.14,
provided Node and Leaf creats new nodes only when necessary. It returns pointers to
existing nodes with the correct label and children whenever possible.
• In Fig. 5.14 entry-x, entry-y, entry-z and entry-a points to the symbol table entries for
identifiers x, y, z and a respectively.
5.20
Compiler Construction Code Generation and Optimization
• The node label = has value numbers 4 and its left and right children's have value
numbers 1 and 3 respectively.
• The node label + has value number 3 and its left and right children have value
numbers 1 and 2 respectively.
• Thus, value numbers help to construct expression DAG's efficiently. Each node is
referred by its value number.
• The signature of an interior node is a triple (op, l, r) where op is label, l = left child and
r = right child value number.
• Let us discuss how value-numbers are useful for constructing the nodes of a DAG.
Constructing Nodes of a DAG:
DAG:
• Search the array for a node N with label (op, l, r). If there is such a node, return the
value number of N, otherwise create a new node M (op, l, r) and return its value
number.
• How to determine the node is already in the array. The most efficient approach is use
of hash table.
table The nodes are put into buckets. Each bucket will have only a few nodes.
• The hash function h computes the number of a bucket from the value of op, l and r. It
will always return the same bucket number for node (op, l, r).
• The bucket index h (op, l, r) is computed and if N is not in the bucket h (op, l, r) then
new node M is created and added to this bucket.
• The buckets can be implemented as link lists as shown in Fig. 5.16.
Fig. 5.16
5.16:
.16: Hash Table
Table for Searching Buckets
• Each cell in a linked list represents a node. The bucket headers, consisting of pointers
to the first cell in a list, are stored in an array.
• The node (op, l, r) is searched in the list and search is successful if node (op, l, r) whose
header at index h (op, l, r) of the array is found.
Example 13:
13: Construct the DAG and identify the value numbers for the subexpressions
of the following expression (a + b) + (a + b).
5.22
Compiler Construction Code Generation and Optimization
Solution:
Solution:
5.23
Compiler Construction Code Generation and Optimization
• Basic blocks play an important role in identifying variables, which are being used
more than once in a single basic block.
• If any variable is being used more than once, the register memory allocated to that
variable need not be emptied unless the block finishes execution.
Definition of Basic Block:
Block:
• A basic block is a sequence of consecutive statement in which flow of control enters at
the beginning and leaves at the end without halting or branching except at the last
instruction.
Algorithm: Partitioning tree-address instructions into basic blocks.
Method:
1. We determine the set of leaders,
leaders the first statements of basic blocks.
The rules for finding the leaders are:
(i) The first three-addresses instruction in the IR is a leader.
(ii) Any instruction that is the target of conditional and unconditional goto or jump
is a leader.
(iii) Any instruction that immediately follows a goto or conditional goto (jump)
statement is a leader.
2. The basic block consists of a leader and the statements before the next leader.
Example 14:
14: Consider the fragment of code shown in Fig. 5.19, which computes dot
product of two vectors of size 10.
{
prod=0;
i=1;
do
{
prod=prod+a[i]*b[i];
i=i+1;
}
while i ≤ 10
}
Fig. 5.19
5.19
The three-address statements are,
1. product = 0
2. i = 1
3. t1 = 4 * i // (4 * i) bytes is the offset address block b
4. t2 = a [t1]
5.24
Compiler Construction Code Generation and Optimization
5. t3 = 4 * i
6. t4 = b [t3]
7. t5 = t2 + t4
8. t6 = prod + t5
9. prod = t6
10. t7 = i + 1
11. i = t7
12. if i ≤ 10 goto 3
Fig. 5.20:
5.20: Three Address
Address Code for Fig. 5.19
st
Here, there are two basic blocks. Statements 1 and 2 are in 1 basic block. The second
basic block has leader statement no. 3 and it contains statement 3 to 12. Since from
statement 3 loop starts and statement 12 has a jump statement to statement 3.
Therefore, 3 is a leader .
So in Fig. 5.20 leaders are statements 1 and 3. Many transformations can be applied to
the basic block to improve the quality of code.
Transformations on Basic Blocks:
Blocks:
• Transformations on basic blocks are:
1. Common sub-expression elimination.
2. Dead-code elimination.
3. Renaming of temporary variables.
4. Interchange of statements.
• Let us see above transformations on basic blocks in detail:
1. Common Sub-
ub-expression
expression Elimination: Consider the basic block.
(i) a=b+c a=b+c
(ii) c = a + d Transformation c=a+d
(iii) x = b + c x=b+c
(iv) y = a + d y=c
• The statements second and fourth compute the same expression i.e. a + d ⇒ b + c + d
(value of a is replaced). So we transform basic block as shown above.
• The statements first and third compute different results as third statement uses value
of c i.e. expression is b + a + d and first statement is b + c. So they are not similar.
2. Dead Code
Code Elimination:
Elimination:
• Dead code is one or more than one code statements. The dead code plays no role in
any program operation and therefore it can simply be eliminated.
5.25
Compiler Construction Code Generation and Optimization
• If we remove any root node from DAG which is never subsequently used (dead), then
repeated application of such transformation will remove all nodes from the DAG that
corresponds to dead code.
3. Renaming Temporary
Temporary Variables:
• Suppose t = a + b where t is temporary. We change to x = a + b where x is new
temporary variable. Then the value of basic block is not changed.
4. Interchange of Statements:
Statements:
• We can interchange the sequence of two statements without affecting the value of
basic blocks.
t1 = a + b
t2 = x + y
B2
y = x;
x++; B2 B3
B3
y = z; B4
z++;
B4
w = x + z; Exit
Fig. 5.2
5.22:
.22: Flow Graph
Graph
Here we add 2 nodes entry and exit, exit that do not corresponds to executable
instructions. Many times the basic block represents intermediate code in the form of
quadruples. If quadruples are moved during code optimization then it causes the
problem of quadruple number reference in jump statements at the end of basic block.
Thus, we prefer to make jumps from point to blocks rather than quadruples as shown
in Fig. 5.22.
Example 16: Construct the three-address code and flow graph of the following code
segment:
void quicksort (m, n)
int m, n
{
int i, j, v, x;
5.27
Compiler Construction Code Generation and Optimization
if (n ≤ m) return;
i = m – 1;
j = n; v = a [n];
while (1)
{
do i = i + 1; while (a [i] < v);
do j = j – 1; while (a [j] > v);
if (i > = j) break;
x = a [i]; a [i] = a [j]; a [j] = x; // swap
}
x = a [i]; a [i] = a [n]; a [n] = x;
quicksort (m, i); quicksort (i + 1, n);
Solution:
(1) i=m–1 (16) t7 = 4 * i
(2) j=n (17) t8 = 4 * j
(3) t1 = 4 * n (18) t9 = a [t8]
(4) v = a [t1] (19) a [t7] = t9
(5) i=i+1 (20) t10 = 4 * j
(6) t2 = 4 * i (21) a [t10] = x
(7) t3 = a [t2] (22) goto (5)
(8) if t3 < v goto (5) (23) t11 = 4 * i
(9) j=j–1 (24) x = a [t11]
(10) t4 = 4 * j (25) t12 = 4 * i
(11) t5 = a [t4] (26) t13 = 4 * n
(12) if t5 > v goto (9) (27) t14 = a [t13]
(13) if i > = j goto (23) (28) a [t12] = t14
(14) t6 = 4 * i (29) t15 = 4 * n
(15) x = a [t6] (30) a [t15] = x
Fig. 5.23:
5.23: Three-
Three-address Code
There are 6 basic blocks. Block B1 start with first instruction. The statement 5 is the
leader of block B2 from which loop start. The statement 9 is the leader of block B3
where next do loop start. The statement 12 is leader of block B4 which is conditional
jump. The statement 14 is leader of block B5 where loop ends. The statement 23 is
leader of block B6 and it is jump from block B4.
5.28
Compiler Construction Code Generation and Optimization
5.29
Compiler Construction Code Generation and Optimization
A DAG is,
Fig. 5.25:
5.25: DAG for Basic
Basic Block
Example 18
18: Construct DAG for the block.
1. t1 = 4 * i 7. prod = t6
2. t2 = a [t1] 8. t7 = i + 1
3. t3 = 4 * i 9. i = t7
4. t4 = b [t3] 10. if i ≤ 10 goto (1)
5. t5 = t2 * t4
6. t6 = prod + t5
Solution: DAG is shown in Fig. 5.26.
5. For fourth statement t4 = b [t3], we create new node b (leaf) and t3 is already exist.
So create a new node labelled [ ] which has two children b and t3.
6. For fifth statement t5 = t2 * t4, create a new node for operator * labelled t5 whose
children are t2 and t4.
7. For sixth statement t6 = prod + t5, create a new node (for (+) lablled t6 whose
children are prod and t5.
8. For seventh statement t6 = prod, add identifier prod to the list of t6.
9. For eighth statement t7 = i + 1, add new leaf labelled 1 and new node + labelled t7.
10. For i = t7, node i is appended to t7 identifier list.
11. For last statement i is left child and 10 is right child of operator < = node.
Example 19:
19: Construct a DAG for block:
a = b+c
b = b–c
c = c+d
x = b+c
Solution:
Fig. 5.27:
5.27: Output of Example 19
Example
Example 20
20: Construct a DAG for block:
b = a [i]
a [j] = d
e = a [i]
Solution:
Fig. 5.28:
5.28: Output of Example 20
20
5.32
Compiler Construction Code Generation and Optimization
5.33
Compiler Construction Code Generation and Optimization
PRACTICE QUESTIONS
Q. I Multiple Choice Questions:
1. Which compiler phase gets the intermediate code as input and produces optimized
intermediate code as output?
(a) code generation (b) code optimization
(c) lexical analysis (d) syntax analysis
2. Which final phase of a compiler and gets input from code optimization phase and
produces the target code or object code as result?
(a) code generation (b) code optimization
(c) lexical analysis (d) syntax analysis
3. Source codes generally have a number of instructions, which are always executed
in sequence and are considered as the basic blocks of the,
(a) token (b) error
(c) code (d) tree
5.35
Compiler Construction Code Generation and Optimization
4. Which graph depicts how the program control is being passed among the blocks?
(a) dependency (b) parse
(c) directed acyclic (d) control flow
5. Which plays no role in any program operation and therefore it can simply be
eliminated?
(a) dead code elimination (b) sleep code elimination
(c) live code elimination (d) None of the mentioned
6. Which has to track both the registers (for availability) and addresses (location of
values) while generating the code.
(a) syntax generator (b) semantic generator
(c) code generator (d) None of the mentioned
7. An operand descriptors consists of,
(a) Attributes contain the subfields type, length and other information of operand.
(b) Addressability specifies the location of the operand and also specifies how to
operand can be accessed.
(c) Address is address of CPU register or memory location.
(d) All of the mentioned
8. In which expressions arranged in a tree-like structure in which internal node
corresponds to the operator and each leaf node corresponds to the operand.
(a) Expression tree (b) Syntax tree
(c) Parse tree (d) None of the mentioned
9. In which notation the expression places the operator at the right end such as as
xy+.
(a) Polish (b) Postfix
(c) Infix (d) Both (a) and (b)
10. Intermediate code generation produces intermediate representations for the
source program which are of the following forms,
(a) Postfix notation (b) Three address code
(c) Expression tree (d) All of the mentioned
11. Which have three fields (operator, operand1, operand2) to implement the three
address code?
(a) quadruples (b) triples
(c) postfix (d) None of the mentioned
12. Compiler optimization is generally implemented using a sequence of optimizing
transformations such as,
(a) Local (b) Global
(c) Both (a) and (c) (d) None of the mentioned
5.36
Compiler Construction Code Generation and Optimization
13. Which is a type of intermediate code that can contain at most three operands? For
example, let ‘x = y + z’ be an entered statement then it will be written in three-
address code as, ‘+ y, z, x’ which means that add y to z and store the result in x.
(a) Three-address code (b) Quadruples-address code
(c) triples-address code (d) None of the mentioned
Answers
1. (b) 2. (a) 3. (c) 4. (d) 5. (a) 6. (c) 7. (d) 8. (a) 9. (d) 10. (d)
11. (b) 12. (c) 13. (a)
Q. II
II Fill in the Blanks:
1. _______ generation phase translates the intermediate code representation of the
source program into the target language program.
2. Code _______ translates the intermediate code into the machine code of the
specified computer.
3. The code which is used for the conversion of source code into machine code is
termed as _______ code and lies in the middle of source code and machine code.
4. _______ acyclic graph is a tool that depicts the structure of basic blocks, helps to see
the flow of values flowing among the basic blocks, and offers optimization too.
5. Their _______ of operations can be reduced by replacing them with other
operations that consume less time and space, but produce the same result.
6. Optimization is a program _______ technique, which tries to improve the code by
making it consume less resources (i.e. CPU, Memory) and deliver high speed.
7. In _______ -independent optimization, the compiler takes in the intermediate code
and transforms a part of the code that does not involve any CPU registers and/or
absolute memory locations.
8. _______ descriptor is used to inform the code generator about the availability of
registers.
9. A sequence of three address statements (a statement involving no more than three
references (two for operands and one for result)) is known as _______ -address
code.
10. The intermediate code generator, which is usually the front end of a compiler, is
used to _______ intermediate code.
11. _______ optimization is performed within a block segment in sequential nature.
The certain optimizations, e.g. loop optimization are beyond the scope of local
optimization.
12. Basic _______ play an important role in identifying variables, which are being used
more than once in a single basic block.
5.37
Compiler Construction Code Generation and Optimization
13. In _______ each instruction in triples presentation has three fields namely,
operator, operand1 and operand2 and the results of respective sub-expressions are
denoted by the position of expression.
14. There are expressions that consume more CPU cycles, time, and memory and these
expressions should be replaced or reduced (strength reduction) with cheaper
expressions without compromising the output of _______ .
15. Optimization can be done by removing unnecessary _______ lines so that it takes
low memory and less execution time.
Answers
1. Code 2. generator 3. intermediate 4. Directed
5. strength 6. transformation 7. machine 8. Register
9. three 10. generate 11. Local 12. blocks
13. Triples 14. expression 15. code
Q. III State True or False:
1. Code generator takes optimized code as an input and generates the target code for
the machine.
2. Basic blocks in a program can be represented by means of control flow graphs.
3. Code optimization phase attempts to improve the intermediate code, so that faster
running machine code will result with less memory space.
4. Operand descriptor keeps track of values stored in each register.
5. Optimization means making the code shorter and less complex, so that it can
execute faster and takes lesser space.
6. Code generation takes the optimized intermediate code as input and maps it to the
target machine language.
7. Machine-dependent optimization is done after the target code has been generated
and when the code is transformed according to the target machine architecture.
8. Generation of the code is often performed at the end of the development stage
since it reduces readability and adds code that is used to increase the performance.
9. In basic blocks when the first instruction is executed, all the instructions in the
same basic block will be executed in their sequence of appearance without losing
the flow control of the program.
10. A three-address code has at most three address locations to calculate the
expression.
11. Dead code is one or more than one code statements, which are either never
executed or unreachable and/or if executed, their output is never used.
12. In Quadruples each instruction in quadruples presentation is divided into four
fields namely, operator, operand1, operand2 and result.
5.38
Compiler Construction Code Generation and Optimization
Answers
1. (T) 2. (T) 3. (T) 4. (F) 5. (T) 6. (F) 7. (T) 8. (F) 9. (T) 10. (T)
11. (T) 12. (T)
Q. IV Answer the following Questions:
(A) Short Answer Questions:
1. What is the purpose of code generation phase of compiler.
2. Give the function of code optimization.
3. Define triples.
4. Which intermediate code representations of expression are suitable for optimizing
compilers?
5. Define postfix string.
6. Define flow graph.
7. Why basic block is transformed into DAG? Give reason.
8. Define the term 'basic block'.
9. Define DAG.
10. List code optimization techniques.
11. Give the DAG representation for the following basic block :
x = a[i]
a[j] = y
12. List types of descriptors.
13. What is the use of register descriptor?
14. Define quadruple.
15. State any two advantages of code optimization.
16. Define dead code.
17. Define optimization.
18. What is frequency reduction.
19. Define basic block.
(B) Long Answer Questions:
1. What is code generation? Explain in detail.
2. What is code optimization? Explain in detail.
3. Write a short note on: Compilations of expressions.
4. What are register and operand descriptors? Explain with example. Also
differentiate them.
5. Describe intermediate code for expressions in detail.
6. What is three address code? Describe with example.
5.39
Compiler Construction Code Generation and Optimization
(ii)
Ans. (i) b * (a + c) + (a + c) * d
(ii) i = i + 5.
April 2017
October 2017
April 2018
October 2018
1. Define Directed Acyclic Graph (DAG). Construct DAG for the following expressions:
(i) (1 + 1*(3–2) + (3–2)*4)
(ii) r + s * t + (s * t)/u. [5 M]
Ans. Refer to Section 5.4.1.
April 2019
5.44
Notes :
Notes :