= A eS Se
p |: AUNT th
age
SYSTEMS
=
~
xInformation contained in this work has been obtained by Tala
McGraw-Hill, from sources believed to be reliable. However,
neither Tata McGraw-Hill nor its authors guarantee the accuracy or
completeness of any information published herein, and neither Tata
McGraw-Hill nor its authors shall be responsible for any errors,
omissions, or damages arising out of use of this information. This
work is published with the understanding that Tata McGraw-Hill
and its authors are supplying information but are not attempting to
render engineering or other professional services. If such services
are required, the assistance of an appropriate professional should be
sought.
Tata McGraw-Hill
© 1999, 1996, 1993, Tata McGraw-
Publishing Company Limited
30th reprint 2009
ROXLCDDXRQRYL
‘No part of this publication may be reproduced or distributed in any
form or by any means, electronic, mechanical, photocopying,
recording, oF otherwise or stored in a database or retrieval system
without the prior written permission of the publishers. The program
listings (if any) may be entered, stored and executed in a computer
system, but they may not be reproduced for publication.
This edition can be exported from India only by the publishers,
‘Tata McGraw-Hill Publishing Company Limited
ISBN-13: 978-0-07-463579-7
ISBN-10; 0-07-463579-4
Published by Tata MeGraw-Hill Publishing Company Limited,
7 West Patel Nagar, New Delhi 110 008, and printed at
India Binding House, Noida 201 301Contents
Preface to the Second Revised Edition
Preface to the Second Edition
Preface to the First Edition
Part I: SYSTEMS PROGRAMMING
1_Language Processors
L1_Intraduction 7
1.2__Language Processing Activities 5
1.3 Fundamentals of Language Processing 9
14 Fundamentals of Language Specification 19
1.5__Language Processor Development Tools 37
Biblio; 34
2__ Data Structures for Language Processing
2.1_Search Data Structures _38
2.2__Allocation Data Structures _52
Bibliogre ST
3. Scanning and Parsing
3.1 Scanning 59
3.2_ Parsing 64
Bibliography 85
4__Assemblers
4.1__Elements of Assembly Language Programming 86
4.2 ASimple Assembly Scheme _9/
4.3 Pass Stmcture of Assemblers 94
4.4 Design of a Two Pass Assembler_95.
4.5 A Single Pass Assembler forIBMPC_1//
Bibliography 130
vii
xi
& Macros and Macro Processors CD
5.2_MacroExpansion 133xiv Contents
5.3_Nested Macro Calls /37,
3.5 Design of a Macro Preprocessor_/45
Bibliography 161
6__Compilers and Interpreters 162
6.1 Aspects of Compilation _ 162
6.2 Memory Allocation 165
6.3 Compilation of Expressions 180
6.4 Compilation of Control Structures 192
6.5 Code Optimization _/99
6.6 Interpreters 2/2
Bibliography 2/8
7_Linkers. 221
7.1__Relocation and Linking Concepts _223
7.2 DesignofaLinker 228
73 Self-Relocating Programs 232
1.4 __A Linker for MS DOS 233,
7.5 Linking for Overlays 245
1.6_Loaders 248
Bibliography 248
&_Software Tools 249,
8.1 Software Tools for Program Development__250
8.2__Editors 257
8.3 Debug Monitors 260
8.4 Programming Environments _262
8.5 UserInterfaces 264
Bibliography 269
Part I: OPERATING SYSTEMS
9 Evolutionof OS Functions (20 8
91 OS Functions 273
92. Evolution af OS Funct 2
9.3 Batch Processing Systems _277
9.4 Multiprogramming Systems 287
9.5 Time Sharing Systems 305
9.6 Real Time Operating Systems 3/1
27 _OSStmcture 322
Bibliography 317
10° Processes 320
10,1 Process Definition 320
10.2 ProcessContral 322Contents _xv
10,3 Interacting Processes 327
10.4 Implementation of Interacting Processes _ 332
10.5 Threads 336
Bibliography 342
11_Scheduling 343
11.1 Scheduling Policies 343
11.2 JobScheduling 35)
11.3 Process Scheduling 353
11.4 Process Management in Unix 365
11.5 Scheduling in MultiprocessorOS 366
Bibliography 368
12 Deadlocks
2.1 Definiti 7
12.2 Resource Status Modelling 372
12.3 Handling Readlocks 377
12.5 Deadlock Avoidance 386
12.6 Mixed Approach to Deadlock Handling 393
Bibliography 395
13. Process Synchronization 396
13.1 Implementing Control Synchronization 396
13.2 Critical Sections 399
13.3 Classical Process Synchronization Problems 408
13.4 Evolution of Language Features for Process Synchronization 4/1
13.5 Semaphores 413
13.6 Critical Regions 419
13.7 Conditional Critical Regions 422
13.8 Monitors 424
13.9 Concurrent Programming in Ada _ 437
Bibliography 443
14_Interprocess Communication 4a7
14.1 Interprocess Messages 447
14.2 Implementation Issues 448
14,3 Mailboxes 454
14.4. Interprocess Messages in Unix 456
14.5 Interprocess Messages in Mach 458
‘Bibliography 459
15 Memory Management 460
15.1 Memory Allocation Preliminaries 461
15.2 Contiguous Memory Allocation 471xvi_ Contents
15.3 Noncontiguous Memory Allocation 479
15.4 Virtual Memory Using Paging 482
15.5 ual Memory Using Segmentation 54
Bibliography 518
16_1O Organization and 1 Programming 521
16.1 10 Organization 522
16.2 10 Devices $26
16.3 Physical OCS (PIOCS) 529
16.4 Fundamental File Organizations 542
16.5 AdvancedIO Programming 544
16.6 Logical OCS 552
16.7 File Processing in Unix 560
Bibliography 560
17_File Systems 561
17.1 Directory Structures _ 563
17.2 FileProtection 569
17.3 Allocation of Disk Space 569
17.4 Implementing File Access 577
175 File Sharing 576
17.6 FileSystem Reliability 578
11.7 The Unix File System 584
Bibliography 587
18 Protection and Security 583
18.1 Encryptionof Data 588
18.2 Protection and Security Mechanisms _59/
18.3 Protection of User Files 592
18.4 Capabilities 596
Bibliography 603
19 Distributed Operating Systems 60s
19.1, Definitionand Examples 605
19.2 Design Issues in Distributed Operating Systems 608
19.3 Networking Issues 61)
19.4 Communication Protocols 615
19.5 System State and Event Precedence 619
19.6 Resource Allocation 622
19.7 Algorithms forDistributed Control 624
19.8 FileSystems 633
19.9 Reliability 637
19.10 Security 643
Bibliography 649
Index 653Preface to the Second Edition
This edition presents a more logical arrangement of topics in Systems Programming and
Operating Systems than the first edition. This has been achieved by restructuring the fol-
lowing material into smaller chapters with specific focus:
+ Language processors: Three new chapters on Overview of language processors,
Data structures for language processors, and Scanning and parsing techniques have
been added. These are followed by chapters on Assemblers, Macro processors,
Compilers and interpreters, and Linkers.
+ Process management: Process management is structured into chapters on Proc-
esses, Scheduling, Deadlocks, Process synchronization, and Interprocess commu-
nication,
Information management: Information management is now organized in the form.
of chapters on IO organization and 10 programming, File systems, and Protection
and security.
Apart from this, some parts of the text have been completely rewritten and new defini-
tions, examples, figures, sections added and exercises and bibliographies updated. New
sections on user interfaces, resource instance and resource request models and distributed.
control algorithms have been added in the chapters on Software tools, Deadlocks and
Distributed operating systems, respectively.
hope instructors and students will like the new look of the book. Feedback from readers,
preferably by email (
[email protected]), are welcome. I thank my wife and family for
‘their forbearance.
DM DHampneRrePart I
SYSTEMS PROGRAMMINGCHAPTER 1
Language Processors
1.1 INTRODUCTION
Language processing activities arise due to the differences between the manner in
which a software designer describes the ideas concerning the behaviour of a soft-
ware and the manner in which these ideas are implemented in a computer system.
‘The designer expresses the ideas in terms related to the application domain of the
software, To implement these ideas, their description has to be interpreted in terms
related to the execution domain of the computer system. We use the term semantics
to represent the rules of meaning of a domain, and the term semantic gap to represent
the difference between the semantics of two domains. Fig. 1.1 depicts the semantic
gap between the application and execution domains.
‘Semantic
ee
Application Execution
main domain
Fig. LA Semantic gap
The semantic gap has many consequences, some of the important ones being
large development times, large development efforts, and poor quality of software,
‘These issues are tackled by Software engineering through the use of methodologies
and programming languages (PLs). The software engineering steps aimed at the use
of a PL can be grouped into
1. Specification, design and coding steps
2. PL implementation steps.2 Systems Programming & Operating Systems.
Software implementation using a PL introduces a new domain, the PL domain.
The semantic gap between the application domain and the execution domain is
bridged by the software engineering steps. The first step bridges the gap between
the application and PL domains, while the second step bridges the gap between the
PL and execution domains. We refer to the gap between the application and PL do
mains as the specification-and-design gap ot simply the specification gap. and the
gap between the PL and execution domains as the execution gap (see Fig. 1.2). The
specification gap is bridged by the software development team, while the execution
gap is bridged by the designer of the programming language processor, viz. a trans-
lator or an interpreter.
Specification Execution
sp ep
000
lication Execution
main domain domain
Fig. 1.2 Specification and execution gaps
Itis important to note the advantages of introducing the PL domain. The gap tobe
bridged by the software designer is now between the application domain and the PL
domain rather than between the application domain and the execution domain. This
reduces the severity of the consequences of semantic gap mentioned earlier, Further,
apart from bridging the gap between the PL and execution domains, the language
processor provides a diagnostic capability which detects and indicates errors in its
input. This helps in improving the quality of the software. (We shall discuss the
diagnostic function of language processors in Chapters 3 and 6.)
‘We define the terms specification gap and execution gap as follows: Specification
gap is the semantic gap between two specifications of the same task, Execution gap
is the gap between the semantics of programs (that perform the same task) written in
different programming languages, We assume that each domain has a specification
language (SL). A specification written in an SL is a program in SL. The specification
language of the PL domain is the PL itself. The specification language of the execu-
tion domain is the machine language of the computer system. We restrict the use of
the term execution gap to situations where one of the two specification languages is
closer to the machine language of a computer system. In other situations, the term.
specification gap is more appropriate.
Language processors
Definition 1.1 (Language processor) A language processor is a software which
bridges a specification or execution gap.Language Processors 3
We use the term Janguage processing to describe the activity performed by a lan-
‘guage processor and assume a diagnostic capability as an implicit part of any form of
language processing. We refer to the program form input to a language processor as
the source program and to its output as the farget program. The languages in which
these programs are written are called source language and target language, tespec-
tively. A language processor typically abandons generation of the target program if
it detects errors in the source program,
A spectrum of language processors is defined to meet practical requirements.
1. A language translator bridges an execution gap to the machine language (or
assembly language) of a computer system. An assembler is a language transla-
tor whose source language is assembly language. A compiler is any language
translator which is not an assembler.
ys
. A detranslator bridges the same execution gap as the language translator, but
in the reverse direction.
3. A preprocessor is a language processor which bridges an execution gap but is
not a language translator.
4. A language migrator bridges the specification gap between two PLs.
Example 1.1 Figure 1.3 shows two language processors. The language processor of part (a)
‘converts a C+ program into a C program, hence it is a preprocessor, The language
processor of part (b) is a language translator for C+ since it produces a machine
language program. In both cases the source program is in C++. The target programs
are the C program and the machine language program, respectively.
Errors
cH Ce c
program “| preprocessor [™ program
(a)
Errors
Machine
CH CH language
program “| translator [™ program
(b)
Fig 1.3 Language processors
Interpreters
An-interpreter is a language processor which bridges an execution gap without gener-
ating a machine language program. In the classification arising from Definition 1.1,4. Systems Programming & Operating Systems
the interpreter is a language translator. This leads to many similarities between trans-
lators and interpreters. From a practical viewpoint many differences also exist be-
tween translators and interpreters.
‘The absence of a target program implies the absence of an output interface of
the interpreter. Thus the language processing activities of an interpreter cannot be
separated from its program execution activities, Hence we say that an interpreter
‘executes’ a program written in a PL. In essence, the execution gap vanishes totally,
Figure 1.4 is a schematic representation of an interpreter, wherein the interpreter do-
main encompasses the PL domain as well as the execution domain. Thus, the spec-
ification language of the PL domain is identical with the specification language of
the interpreter domain. Since the interpreter also incorporates the execution domain,
it is as if we have a computer system capable of ‘understanding’ the programming
language. We discuss principles of interpretation in Section 1.2.2.
Fig, L4 Interpreter
Problem oriented and procedure oriented languages
The three consequences of the semantic gap mentioned at the start of this section
are in fact the consequences of a specification gap. Software systems are poor in
quality and require large amounts of time and effort te develop duc to difficulties in
bridging the specification gap. A classical solution is to develop a PL such that the PL
domain is very close or identical to the application domain. PL features now directly
model aspects of the application domain, which leads to a very small specification
gap (see Fig. 1.5). Such PLs can only be used for specific applications, hence they
are called problem oriented languages. They have large execution gaps, however this
is acceptable because the gap is bridged by the translator or interpreter and does not
concem the software designer.
‘A procedure oriented language provides general purpose facilities required in
most application domains, Such a language is independent of specific application
domains and results in a large specification gap which has to be bridged by an appli-
cation designer.Language Processors 5
Specification Execution
00 0
“\ Problem — Execution
oriented = domain
language
domain
Fig. 1.5 Problem oriented language domain
1.2 LANGUAGE PROCESSING ACTIVITIES
‘The fundamental language processing activities can be divided into those that bridge
the specification gap and those that bridge the execution gap. We name these activi-
ties as
1. Program generation activities
2. Program execution activities.
A program generation activity aims at automatic generation of a program. The source
Janguage is a specification language of an application domain and the target language
is typically a procedure oriented PL. A program execution activity organizes the
execution of a program written in a PL on a computer system. Its source language
could be a procedure oriented language or a problem oriented language.
1.2.1. Program Generation
Figure 1.6 depicts the program generation activity. The program generator is a soft-
‘ware system which accepts the specification of a program to be generated, and gen-
erates a program in the target PL. In effect, the program generator introduces a new
domain between the application and PL domains (see Fig. 1.7). We call this the
program generator domain. The specification gap is now the gap between the appli-
cation domain and the program generator domain. This gap is smaller than the gap
between the application domain and the target PL domain.
a
Program Program Program
specification ——“) generator [—~ target PL
Fig. 1.6 Program generation
Reduction in-the specification gap increases the reliability of the generated pro-
gram. Since the generator domain is close to the application domain, it is easy for
the designer or programmer to write the specification of the program to be generated.6 Systems Programming & Operating Systems
Specification
Application Program TargerPL Execution
in generator domain domain
‘domain
Fig. 1.7 Program generator domain
‘The harder task of bridging the gap to the PL domain is performed by the generator.
This arrangement also reduces the testing effort. Proving the correctness of the pro-
gram generator amounts to proving the correctness of the transformation of Fig. 1.6.
This would be performed while implementing the generator. To test an application
generated by using the generator, it is necessary to only verify the correctness of the
specification input to the program generator, This is a much simpler task than ver-
ifying correctness of the generated program. This task can be further simplified by
providing a good diagnostic (i.e. error indication) capability in the program generator
which would detect inconsistencies in the specification,
Itis more economical to develop a program generator than to develop a prob-
lem oriented language. This is because a problem oriented language suffers a very
large execution gap between the PL domain and the execution domain (see Fig. 1.5),
whereas the program generator has a smaller semantic gap to the target PL domain,
which is the domain of a standard. procedure oriented language. The execution gap
between the target PL domain and the execution domain is bridged by the compiler
or interpreter for the PL.
Example 1.2 A screen handling program (also called a form fillin program) handles screen
10 in a data entry environment. It displays the field headings and default values for
various fields in the screen and accepts data values for the ticlds. Figure 1.8 shows a
screen for data entry of employee information. A data entry operator can move the
cursor to a field and key in its value, The screen handling program accepts the value
and stores it in a data base.
A screen generator generates screen handling programs, Tt accepts a specification of
the screen to be generated (we will call it the sereen spec) and generates a program
that performs the desired screen handling. The specification for some fields in Fig. 1.8
could be as follows:
Employee name : char : start(lines2,position=25)
end(Line=2,position=80)
Married + char ; start(Line=10,position=25)
end(Line=10, position=27)
default (‘¥es")
Errors in the specification, e.g. invalid start or end positions or conflicting specifica-
tions for a field, are detected by the generator. ‘The generated screen handling programLanguage Processors 7
validates the data during data entry, e.g. the age field must only contain digits, the sex
field must only contain M or F, etc.
Employee Name
Address
Married Yes
ee cS salt
Fig. 1.8 Screen displayed by a screen handling program
1.2.2 Program Execution
‘Two popular models for program execution are translation and interpretation.
Program translation
‘The program translation model bridges the execution gap by translating a program
written in a PL, called the source program (SP), into an equivalent program in the
machine or assembly language of the computer system, called the zarget program
(IP) (see Fig. 1.9).
oo Go
Source mie language |_ Target
program Friar LL program progeam
Fig. 19 Program translation model
Characteristics of the program translation-model are:
« A program must be translated before it can be executed.
The translated program may be saved ina file. The saved program may be
executed repeatedly.
» A program must be retranslated following modifications,
Program interpretation
Figure 1.10(a) shows a schematic of program interpretation. The interpreter reads the
source program and stores it in its memory. During interpretation it takes a source8 Systems Prog jing & Operating S:
statement, determines its meaning and performs actions which implement it. This
includes computational and input-output actions.
‘To understand the functioning of an interpreter, note the striking similarity be-
tween the interpretation schematic (Fig. 1.10(a)) and a schematic of the execution of
a machine language program by the CPU of a computer system (Fig. 1.10(b)). The
CPU uses a program counter (PC) to note the address of the next instruction to be
executed. This instruction is subjected to the instruction execution cycle consisting,
of the following steps:
1. Fetch the instruction.
2. Decode the instruction to determine the operation to be performed, and also its
operands.
3. Execute the instruction.
At the end of the cycle, the instruction address in PC is updated and the cycle is
repeated for the next instruction. Program interpretation can proceed in an analogous
manner. Thus, the PC can indicate which statement of the source program is to be
interpreted next, This statement would be subjected to the interpretation cycle, which
could consist of the following steps:
1, Fetch the statement,
2. Analyse the statement and determine its meaning, viz. the computation to be
performed and its operands.
3. Execute the meaning of the statement.
Interpreter Memory
@ (b)
‘Fig. L1@ Schematics of (a} interpretation, (b) program execution
From this analogy, we can identify the following characteristics of interpretation:
© The source program is retained in the source form itself, i.e. no target program
form exists,
© A statement is analysed during its interpretation.
Section 6.6 contains a detailed description of interpretation.Language Processors 9
A fixed cost (the translation overhead) is incurred in the use of the program transla-
tion model. If the source program is modified, the translation cost must be incurred
again irrespective of the size of the modification. However, execution of the’ target
program is efficient since the target program is in the machine language. Use of the
interpretation model does not incur the translation overheads. This is advantageous
if a program is modified between executions, as in program testing and debugging.
Interpretation is however slower than execution of a machine language program be-
cause of Step 2 in the interpretation cycle.
1.3 FUNDAMENTALS OF LANGUAGE PROCESSING
Definition 1.2 (Language Processing)
Language Processing = Analysis of SP + Synthesis of TP.
Definition 1.2 motivates a generic model of language processing activities. We
refer to the collection of language processor components engaged in analysing a
source program as the analysis phase of the language processor. Components en-
gaged in synthesizing a target program constitute the synshesis phase.
A specification of the source language forms the basis of source program analy-
sis. The specification consists of three components:
1. Lexical rules which govern the formation of valid lexical units in the source
language.
2, Syntax rules which govern the formation of valid statements in the source lan-
guage.
3. Semantic rules which associate meaning with valid statements of the language.
‘The analysis phase uses each component of the source language specification to
determine relevant information concerning a statement in the source program. Thus,
analysis of a source statement consists of lexical, syntax and semantic analysis.
Example 1.3 Consider the statement
percent.profit := (profit * 100) / cost.price;
in some programming language. Lexical analysis identifies :=, * and / as operators,
100s a constant and the remaining strings as identifiers. Syntax analysis identifies the
statement as an assignment statement with percent profit as the left hand side and.
(profit * 100) / cost.price as the expression on the right hand side. Semantic
analysis determines the meaning of the statement to be the assignment of
profit x 100
‘coat price
to percent.profit.10 Systems Programming & Operating Systems
The synthesis phase is concerned with the construction of target language state-
ment(s) which have the same meaning as a source statement. Typically, this consists
of two main activities:
© Creation of data structures in the target program
© Generation of target code.
We refer to these activities as memory allocation and code generation, respectively.
Example 1.4 A language processor genetates the following assembly language statements
for the source statement of Ex. 1.3.
MOVER AREG, PROFIT
MULT AREG, 100
DIV AREG, COST_PRICE
MOVEN AREG, PERCENT_PROFIT
PERCENT PROFIT DW 1
PROFIT DW 1
COST_PRICE DW 1
where MOVER and MOVEM move a value from a memory location to a CPU register
and vice versa, respectively, and DW reserves one or more words in memory. Need=
Iess to say, both memory allocation and code generation are influenced by the target
machine's architecture.
Phases and passes of a language processor
From the preceding discussion it is clear that.a language processor consists of two
distinct phases—the analysis phase and the synthesis phase. Figure 1.11 shows a
schematic of a language processor. This schematic, as also Examples 1.3 and 1.4 may
give the impression that Janguage processing can be performed on a statement-by-
‘statement basis—that is, analysis of a source statement can be immediately followed
by synthesis of equivalent target statements. This may not be feasible duc to:
Forward references
Language processor
Source _| J Analysis Synthesis Target
program phase phase program
t —+—_
Errors Errors
Fig 1.11 Phases of a language processorLanguage Processors 11
+ Issues concerning memory requirements and organization of a language pro-
cessor.
We discuss these issues in the following.
Definition 1.3 (Forward reference) A forward reference ofa program entity is a refer-
ence to the entity which precedes its definition in the program.
While processing a statement containing a forward reference, a language proces-
sor does not possess all relevant information conceming the referenced entity. This
creates difficulties in synthesizing the equivalent target statements. This problem
can be solved by postponing the generation of target code unti] more information
concerning the entity becomes available. Postponing the generation of target code
may also reduce memory requirements of the language processor and sim] i
organization.
Example 1.5 Consider the statement of Ex. 1.3 to be a part of the following program in
some programming language:
percent-profit := (profit * 100) / costprice;
long profit;
‘The statement long profit; declares profit to have a double precision value. The
reference to profit in the assignment statement constitutes a forward reference be-
cause the declaration of profit occurs later in the program. Since the type of profit
is not known while processing the assignment statement, correct code cannot be gen-
erated for it ini a statement-by-statement manner.
Departure from the statement-by-statement application of Definition 1.2 leads to
the multipass model of language processing.
Definition 1.4 (Language processor pass) A language processor pass is the processing
of every statement in a source program, or its equivalent representation, to perform
@ language processing function (a set of language processing functions).
Here ‘pass’ is an abstract noun describing the processing performed by the lan-
guage processor, For simplicity, the part of the language processor which performs
one pass over the source program is also called a pass,
Example 1.6 It is possible to process the program fragment of Ex. 1.5 in two passes as
follows:
Pass: Perform analysis of the source program and
note célevant information
Pass Il: Perform synthesis of target program
Information concerning the type of profit is noted in pass |. This information is used
during pass IT to perform coae generation.12. Systems Programming & Operating Systems
Intermediate representation of programs
The language processor of Ex. 1.6 performs certain processing more than once. In
pass I it analyses the source program to note the type information. In pass I], it once
again analyses the source program to generate target code using the type information
noted in pass I. This can be avoided using an intermediate representation of the
source program.
Definition 1.5 (Intermediate Representation (IR)) An intermediate representation (IR)
is a representation ofa source program which reftects the effect of some, but not all,
analysis and synthesis tasks performed during language processing.
‘The IR is the ‘equivalent representation’ mentioned in Definition 1.4. Note that
the words ‘but not all’ in Definition 1.5 differentiate between the target program and
an IR. Figure 1.12 depicts the schematic of a two pass language processor. The first
pass performs analysis of the source program and reflects its results in the intermedi-
‘ate representation. The second pass reads and analyses the IR, instead of the source
program, to perform synthesis of the target program. This avoids repeated processing
of the source program. The first pass is concemed exclusively with source language
issues. Hence itis called the front end of the language processor. The second pass is
concemed with program synthesis for a specific target language. Hence it is called
the back end of the language processor. Note that the front and back ends of a lan-
guage processor need not coexist in memory. This reduces the memory requirements
of a language processor.
Source Target
pou, Front end Back end progam
Zz
Intermediate
representation (IR)
Fig. L12. Two pass schematic for language processing
Desirable properties of an IR are:
© Ease of use: IR should be easy to construct and analyse.
© Processing efficiency: efficient algorithms must exist for constructing and
analysing the IR.
© Memory efficiency: IR must be compact.
Like the pass structure of language processors, the nature of intermediate repre-
sentation is influenced by many design and implementation considerations. In the
following sections we will focus on the fundamental issues in language processing.
Wherever possible and relevant, we will comment on suitable IR forms.Language Processors 13
Semantic actions
As seen in the preceding discussions, the front end of a language processor analyses
the source program and constructs an IR, All actions performed by the front end,
except lexical and syntax analysis, are called semantic actions. These inelude actions
for the following:
1. Checking semantic validity of constructs in SP
2. Determining the meaning of SP
3. Constructing an IR.
13.1 A Toy Compiler
We briefly describe the front end and back end of a toy compiler for a Pascal-like
language.
13.1.1 The Front End
The front end performs lexical, syntax and semantic analysis of the source program.
Each kind of analysis involves the following functions:
1, Determine validity of a source statement from the viewpoint of the analysis.
2. Determine the ‘content’ of a source statement.
3. Construct a suitable representation of the source statement for use by subse-
quent analysis functions, or by the synthesis phase of the language processor.
‘The word ‘content’ has different connotations in lexical, syntax and semantic
analysis. In lexical analysis, the content is the lexical class to which each lexical unit
belongs, while in syntax analysis it is the syntactic structure of a source statement.
In semantic analysis the content is the meaning of a statement—for a declaration
statement, it is the set of attributes of a declared variable (e.g. type, length and di-
mensionality), while for an imperative statement, it is the sequence of actions implied
by the statement,
Each analysis represents the ‘content’ of a source statement in the form of (1) ta-
bles of information, and (2) description of the source statement. Subsequent analysis
uses this information for its own purposes and either adds information to these tables
and descriptions, or constructs its own tables and descriptions, For example, syntax
analysis uses information concerning the lexical class of lexical units and constructs
arepresentation for the syntactic structure of the source statement. Semantic anal
uses information concerning the syntactic structure and constructs a representation
for the meaning of the statement. The tables and descriptions at the end of semantic
analysis form the IR of the front end (see Fig. 1.12).
Output of the front end
‘The IR produced by the front end consists of two components:14 Systems Programming & Operuting Systems
1, Tables of information
2. Am intermediate code (IC) which is a description of the source program.
Tables
Tables contain the information obtained during different analyses of SP. The most
important table is the symbol table which contains information concerning all iden-
tifiers used in the SP. The symbol table is built during lexical analysis. Semantic
analysis adds information concerning symbol attributes while processing declaration
statements. It may also add new names designating temporary results.
Intermediate code (IC)
The IC is a sequence of IC units, each IC unit representing the meaning of one action.
in SP. IC units may contain references to the information in various tables.
Example 1.7 Figure 1.13 shows the IR produced by the analysis phase for the program
a: integer;
a,b: real;
a r= bs;
Symbol table
symbol type _—length address
1 i int
2 a real
3[ | real
4 is real
5 [ eemp | real
Intermediate code
1. Convert (Id, #1) to real, gi
2, Add (Id, #4) to (Id, #3),
3, Store (Id, #5) in (Id, #2)
Fig. 1.13 IR for the program of Example 1.8
‘The symbol table contains information conceming the identifiers and their types. This
information is determined during lexical and semantic analysis, respectively. In IC,
the specification (Id, #1) refers to the id occupying the first entry in the table. Note that
is und temp are temporary names added during semantic analysis of the assignment
statement.
Lexical analysis (Scanning)
Lexical analysis identifies the lexical units in a source statement. It then classifies
the units into different lexical classes, e.g. id’s, constants, reserved id’s, etc. andLanguage Processors 15
enters them into different tables. ‘This classification may be based on the nature of a
string of on the specification of the source language. (For example, while an integer
constant is a string of digits with an optional sign, a reserved id is an id whose name
matches one of the reserved names mentioned in the language specification.) Lexical
analysis builds a desctiptor, called a token, for each lexical unit. A token contains
two fields—class code, and number in class. class code identifies the class to which
a lexical unit belongs, number in class is the entry number of the lexical unit in the
relevant table, We depict a token as (Code #no), c.g. (Id #10} The IC for a statement
is thus a string of tokens,
Example 1.8 The statement a := bt; is represented as the string of tokens
(EHR Op ASN FE AF) Op #3) FA #T)(Op #70)
where [fd #2] stands for ‘identifier occupying entry #2 in the Symbol tabl.
a (see Fig. 1.13). [Op 43] similarly stands for the operator *:=", etc.
Syntax analysts (Parsing)
‘Syntax analysis processes the string of tokens built by lexical analysis to determine
the statement class, €.g. assignment statement, if statement, etc. It then builds an IC
which represents the structure of the statement. The IC is passed to semantic analysis
to determine the meaning of the statement.
Example 1.9 Figure 1.14 shows IC for the statements a,b : real; anda := bti;. A
tree form is chosen for IC because a tree can represent the hierarchical structure of a
PL statement appropriately. Each node in a tree is labelled by an entity. For simpli
‘we use the source form of an entity, rather than its token. IC for the assignment
statement shows that the computation b+4 is a part of the expression occurring on the
RUHS of the assignment.
A ‘,
Fig. 114 IC for the statements abi seal; ai=b+i;
Semantic analysis
Semantic analysis of declaration statements differs from the semantic analysis of
imperative statements. The former results in addition of information to the symbol
table, e.g. type, length and dimensionality of variables. The latter identifies the
sequence of actions necessary to implement the meaning of a source statement. In
both cases the structure of a source statement guides the application of the semantic
rules. When semantic analysis determines the meaning of a subtree in the IC, it adds16 Systems Programming & Operating Systems
information to a table or adds an action to the sequence of actions. It then modifies
the IC to enable further semantic analysis. The analysis ends when the tree has been
completely processed. The updated tables and the sequence of actions constitute the
IR produced by the analysis phase.
Example 1.10 Semantic analysis of the statement a := b+i; proceeds as follows:
1. Information concerning the type of the operands is added to the IC tree. The
IC tree now looks as in Fig. 1.15(a).
2. Rules of meaning governing an assignment statement indicate that the expres-
sion on the right hand side should be evaluated first. Hence focus shifts to the
tight subtree rooted at ‘+’,
3. Rules of addition indicate that type conversion of 4 should be performed to
ensure type compatibility of the operands of “+. This leads to the action
(i) Convert i to real, giving i+.
which is added to the sequence of actions. The IC tree under consideration is
modified to represent the effect of this action (sce Fig. 1.15(b)). The symbol
A» is now added to the symbol table,
“Nn oN “NN
a, real + a, real + a,real temp, real
y™
by real 4, int bral i+, real,
@) (b) (ec)
‘Fig. LIS. Steps in semantic analysis of an assignment statement
4, Rules of addition indicate that the addition is now feasible. This leads to the
action
(ii) Add i+ to, giving temp.
The IC tree is transformed as shown in Fig. 1.15(c), and temp is added to the
symbol table,
5. The assignment can be performed now. This leads to the action
(iii) Store temp in a.
This completes semantic analysis of the statement. Note that IC generated here és
identical with that shown in Fig. 1.13.
Figure 1.16 shows the schematic of the front end where arrows indicate flow of
data.
1.3.1.2 The Back End
‘The back end performs memory allocation and code generation,Language Processors 17
errors analysis,
Lexical J! [Scanning
errors! .
| Tokens | SN Symbol table
Syntax 1 Parsing + Constants table
ores |e] : Other tables
semantic! [Sonmue 4
'
Fig. L.16 Front end of the toy compiler
Memory allocation
Memory allocation is a simple task given the presence of the symbol table. The
memory requirement of an identifier is computed from its type, length and dimen-
sionality, and memory is allocated to it. The address of the memory area is entered
in the symbol table.
Example 1.11 Afier memory allocation, the symbol table looks as shown in Fig. 1.17. The
entries for i* and temp are not shown because memory allocation is not needed for
these id's. o
symbol type length address
Va int 2000
2 real 2001
3 real 2002
Fig. 1.17 Symbol table after memory allocation
Note that certain decisions have to precede memory allocation, for example,
whether is and temp of Ex. 1.10 should be allocated memory. These decisions
are taken in the preparatory steps of code generation.
‘Code generation
Code generation uses knowledge of the target architecture, viz. knowledge of in-
structions and addressing modes in the target computer, to select the appropriate
instructions. The important issues in code generation are:18 Systems Programming & Operating Systems
1. Determine the places where the intermediate results should be kept, i.e. whether
they should be kept in memory locations or held in machine registers, This is
a preparatory step for code generation.
2, Determine which instructions should be used for type conversion operations,
3. Determine which addressing modes should be used for accessing variables.
Example 1.12 For the sequence of actions for the assignment statement a := b+; in
Ex. 1.10, viz.
{i)Convert i to real, giving 1*,
(ii) Add i to b, giving temp,
Gi) Store temp ina.
the synthesis phase may decide to hold the values of 4 and temp in machine registers,
and may generate the assembly code
CONVR = AREG, T
ADDR AREG, B
MOVEH AREG, A
where CONV.R converts the value of I into the real representation and leaves the result
in AREG. ADDR performs the addition in real mode and MOVEM puts the result into the
memory area allocated to A.
Some issues involved in code generation may require the designer to look beyond
machine architecture. For example, whether or not the value of temp should be stored
in a memory location in Ex. 1.12 would partly depend on whether the value of b+i
is used more than once in the program. This is an aspect of code optimization.
Figure 1.18 shows a schematic of the back end.
Memory
1 allocation
‘Symbol table
Constants table
Other tables
Fig, 1.18 Back end of the toy compilerLanguage Processors 19
14 FUNDAMENTALS OF LANGUAGE SPECIFICATION
Asmentioned earlier, a specification of the source language forms the basis of source
program analysis. In this section, we shall discuss important lexical, syntactic and
semantic features of a programming language.
1.4.1 Programming Language Grammars
The lexical and syntactic features of a programming language are specified by its
grammar, This section discusses key concepts and notions from formal language
grammars. A language L can be considered to be a collection of valid sentences,
Each sentence can be looked upon as a sequence of words, and each word as a se-
quence of letters or graphic symbols acceptable in L. A language specified in this
manner is known as a formal language, A formal language grammar is a set of rules
which precisely specify the sentences of L. It is clear that natural languages are not
formal languages due to their rich vocabulary. However, PLs are formal languages.
‘Terminal symbols, alphabet and strings
The alphabet of L, denoted by the Greek symbol 3, is the collection of symbols in
its character set. We will use lower case letters a, b, ¢, etc. to denote symbols in 5.
A symbol in the alphabet is known as a terminal symbol (T) of L. The alphabet can
be represented using the mathematical notation of a set, e.g.
E= {a,b,...z,0,1,...9}
Here the symbols {, ',” and } are part of the notation, We call them merasymbots
to differentiate them from terminal symbols. Throughout this discussion we assume
that metasymbols are distinct from the terminal symbols. If this is not the case, i.
a terminal symbol and a metasymbol are identical, we enclose the terminal symbol in
quotes to differentiate it from the metasymbol, For example, the set of punctuation
symbols of English can be defined as
{:
where ‘7 denotes the terminal symbol ‘comma’.
A string is a finite sequence of symbols, We will represent strings by Greek
symbols GB, ¥, etc. Thus ct = axy is a string over E. The length of a string is the
number of symbols in it. Note that the absence of any symbol is also a string, the
null string ©. The concatenation operation combines two strings into a single string,
Itis used to build larger strings from existing strings. Thus, given two strings & and
B, concatenation of « with B yields a string which is formed by putting the sequence
of symbols forming c before the sequence of symbols forming B. For example, if &
=ab, B= axy, then concatenation of a and B, represented as 0.8 or simply of, gives
the string abaxy. The null string can also participate in a concatenation, thus a.
easa.20 Systems Programming & Operating Systems
Nonterminal symbols
Productions
A nonierminal symbol (NT) is the name of a syntax category of a language, e.g.
noun, verb, etc. An NT is written as a single capital letter, or as a name enclosed
between <...>, eg. Aor < Noun >. During grammatical analysis, a nonterminal
symbol represents an instance of the category. Thus, < Noun > represents a noun.
A production, also called a rewriting rule, is a rule of the grammar, A production has
the form
Anonterminal symbol :: = String of Ts and NTs
and defines the fact that the NT on the LHS of the production can be rewritten as
the string of Ts and NTs appearing on the RHS. When an NT can be written as one
of many different strings, the symbol ‘|’ (standing for ‘or") is used to separate the
strings on the RHS, e.g.
< Article > alan| the
‘The string on the RHS of a production ean be a concatenation of component
strings, e.g. the production
< Noun Phrase >
< Noun >
expresses the fact that the noun phrase consists of an article followed by a noun.
Each grammar G defines a language L. G contains an NT called the distin-
guished symbol or the start NT of G. Unless otherwise specified, we use the symbol
S as the distinguished symbol of G. A valid string o of Le is obtained by using the
following procedure
1. Let e="S".
2. While ois nota string of terminal symbols
(a) Select an NT appearing in 01, say X.
(b) Replace X by a string appearing on the RHS of a production of X.
Example 1.13. Grammar (1.1) defines a language consisting of noun phrases in English
< Noun Phrase > < Article >< Noun >
alan|the an
< Noun > boy| apple
< Noun Phrase > is the distinguished symbol of the grammar, the boy and an
apple are some valid strings in the language.Language Processors 24
Definition 1.6 (Grammar) A grammar G of a language Lg is a quadruple (2, SNT, S,
P) where
E isthe alphabet of Lc, ie. the set of Ts,
SNT is she set of NTS,
S isthe distinguished symbol, and
P is the set of productions.
Derivation, reduction and parse trees
A grammar G is used for two purposes, to generate valid strings of Lg and to ‘rec-
‘ognize’ valid strings of Lg. ‘The derivation operation helps to generate valid strings
while the reduction operation helps to recognize valid strings, A parse tree is used
to depict the syntactic structure of a valid string as it emerges during a sequence of
derivations or reductions.
Derivation
Let production Py of grammar G be of the form
Pi: Ansa
and let B be a string such that B = YAQ, then replacement of A by o in string 5
constitutes a derivation according to production P;, We use the notation N => 7 to
denote direct derivation of n from N and N => 1) to denote transitive derivation of
11 (i.e. derivation in zero or more steps) from N, respectively. Thus, A => a. only if
A ::=a:is a production of G and A3-8 if A >... => 5. We can use this notation
to define a valid string according to a grammar G as follows: 5 is a valid string
according to G only if $ 4 8, where $ is the distinguished symbol of G.
Example 1.14 Derivation of the string the bey according to grammar (1.1) can be depicted
as
=> < Article > < Noun >
= the < Noun >
= theboy
A string 0: such that § $c. is a sentential form of Lg. The string o.is a sentence
of Le if it consists of only Ts.
Example 1.15 Consider the grammar G
< Sentence >
< Noun Phrase >
< Noun Phrase >< Verb Phrase >
< Article >< Noun >
< Verb Phrase > < Verb >< Noun Phrase > 12)
alan|the
< Noun > boy | apple
ate22. Systems Programming & Operating Systems
Reduction
Parse trees
‘The following strings are sentential forms of Lg.
< Noun Phrase >< Verb Phrase >
the boy < Verh Phrase >
< Noun Phrase > ate < Noun Phrase >
tho boy ate < Noun Phrase >
the boy ate an apple
However, only the boy ate an appleis.a sentence.
Let production P; of grammar G be of the form
Pi A
«
and let & be a string such that 6 = yo, then replacement of o by A in string a
constitutes a reduction according to production Py. We use the notations 1) —+ Nand
11 + N to depict direct and transitive reduction, respectively. Thus, @—+ A only if
A ::= tis a production of G and a ++ A if a... -+ A. We define the validity of
some string 5 according to grammar G as follows: 6 is a valid string of Lg if 5 +s S,
where S is the distinguished symbol of G.
Example 1416 To determine the validity ofthe string
the boy ate an apple
according to grammar (1.2) we perform the following reductions
Step String
the boy ate an apple
boy ate an apple
ate an apple
< Noun > < Verb > an apple
< Noun > < Verb > < Article > apple
< Verb > < Article > < Noun >
< Verb > < Article > < Noun >
< Verb > < Nown Phrase >
<< Noun Phrase > < Verb Phrase >
< Sentence >
CeudsaHnEEnHs
‘The string is a sentence of Le since we are able to construct the reduction sequence
the boy ate an apple —} < Sentence >.
‘A sequence of derivations or reductions reveals the syntactic structure of a string with
respect to G. We depict the syntactic structure in the form of a parse tree. Derivation
according to the production A :: = @ gives rise to the following clemental parse tree:Language Processors 23
A
a ee
(Sequence of Ts and NTs constituting @)
A subsequent step in the derivation replaces an NT in a, say NTa by a string, We
can build another elemental parse tree to depict this derivation, ¥
NT;
‘We can combine the two trees by replacing the node of NT; in the first tree by this
tree, In essence, the parse tree has grown in the downward direction due to a deriva-
tion, We can obtain a parse tee from a sequence of reductions by performing the
converse actions. Such a tree would grow in the upward direction.
Example 1.17 Figure 1.19 shows the parse tree of the string the boy ate an apple ab-
tained using the reductions of Ex. 1.16. The superscript associated with a node in the
tree indicates the step in the reduction sequence which led to the subtree rooted at that
node. Reduction steps | and 2 lead to reduction of the and boy to and
, respectively. Step 3 combines the parse trees of < Article > and <:noun >
to give the subtree rooted at < Noun Phrase >.
< Sentence >°
4,™
® —< Verb phrase >*®
4” ZN
! < Noun >? < Verb >) < Noun phrase >?
* *
| |
the boy ate an apple
Fig. 1.19 Passe tes
Note that an identical tree would have been obiainedif'the bey ate an apple was
derived from S.
Recursive specification
Grammar (1.3) is a complete grammar for an arithmetic expression containing the
operators t (exponentiation), * and +.
< exp > + < term >| < term>24 Systems Programming & Operating Systems
* < factor >|
< factor» < factor >t < primary » }< primary >
< primary > < id >| < constant >t (<éxp >) a3)
< letter >| < id > [< letter >| < digit >]
[+ | -] < digit >|< const >< digit >
alble| ..[z
< digit > 0/1/213|4/5/6|7]/8/9
‘This grammar uses the notation known as the Backus Naur Form (BNF). Apart
trom the familiar elements ::=, | and <... >, a new element here is [...], which is
used to enclose an optional specification. Thus, the rules for < id > and
in grammar (1.3) are equivalent to the rules
< letter >|< id >< letter >|< id >< digit >
< digit > + < digit >| — < digit >
| < digit >
Grammar (1.3) uses recursive specification, whereby the NT being defined in
a production itself occurs in a RHS string of the production, eg. Xii=...X.
‘The RHS alternative employing recursion is called a recursive rule. Recursive rules
simplify the specification of recurring constructs.
Example 1.18 A non-recursive specification for expressions containing the ‘+’ operator
would have to be written as
us |+
I< term > + + |...
Using recursion, can be specified simply as
+ |<1erm > (ay
The first alternative on the RHS of grammar (1.4) is recursive. It permits an
unbounded number of ‘+" operators in an expression, The second altemative is non-
recursive, It provides an ‘escape’ from recursion while deriving or recognizing ex-
pressions according to the grammar. Recursive rules are classified into deft-recursive
rudes and right-recursive rules depending on whether the NT being defined appears
on the extreme left or extreme right in the recursive rule. For example, all recursive
rules of grammar (1.3) are left-recursive rules. Indirect recursion occurs when two
or more NTs are defined in terms of one another. Such recursion is useful for speci-
fying nested constructs in a language. In grammar (1.3), the alternative
() gives tise to indirect recursion because => . ThisLanguage Processors 2S
specification permits a parenthesized expression to occur in any context where an
identifier or constant can occur.
Direct recursion is not useful in situations where a limited number of occurrences
is required. For example, the recursive specification
= < letter >| [< letter >| < digit >)
permits an identifier string to contain an unbounded number of charaeters, which is
not correct. In such cases, controlled recurrence may be specified as
= {< letter >|< digit >} 8
where the notation {...}{° indicates 0 to 15 occurrences of the enclosed specifica-
tion,
1.4.1.1 Classification of Grammars
Grammars are classified on the basis of the nature of productions used in them
(Chomsky, 1963). Each grammar class has its own characteristics and limitations.
‘Type-0 grammars
These grammars, known as phrase structure grammars, contain productions of the
form
ausB
where both a and 8 can be strings of Ts and NTs. Such productions permit arbitrary
substitution of strings during derivation or reduction, hence they are not relevant t0
specification of programming languages.
Type-1 grammars
‘These grammars are known as context sensitive grammars because their productions
specify that derivation or reduction of strings can take place only in specific contexts.
A Type- production has the form
aABssanBp
‘Thus, a string 7 ina sentential form can be replaced by ‘A’ (or vice versa) only when it
is enclosed by the strings ct and f.. These grammars are also not particularly relevant
for PL specification since recognition of PL constructs is not context sensitive in
nature.26 Systems Programming & Operating Systems
‘Type-2 grammars
These grammars impose no context requirements on derivations or reductions. A
typical Type-2 production is of the form
Aust
which can be applied independent of its context. These grammars are therefore
known as context free grammars (CFG). CFGs are ideally suited for programming
language specification, Two best known uses of Type-2 grammars in PL specifica-
tion are the ALGOL-60 specification (Naur, 1963) and Pascal specification (Jensen,
Wirth, 1975). The reader can verify that grammars (1.2) and (1.3) are Type-2 gram-
mars.
‘Type-3 grammars
‘Type-3 grammars are characterized by productions of the form
tB|t or
Belt
Note that these productions also satisfy the requirements of Type-2 grammars. The
specific form of the RHS alternatives—namely a single T or a string containing a
single T and a single NT—gives some practical advantages in scanning (we shall see
this aspect in Chapter 6). However, the nature of the productions restricts the expres-
sive power of these grammars, e.g. nesting of constructs or matching of parentheses
cannot be specified using such productions. Hence the use of Type-3 productions is
restricted to the specification of lexical units, e.g. identifiers, constants, labels, etc.
The productions for and in grammar (1.3) are in fact Type-
3 in nature. This can be seen clearly when we rewrite the production for < id > in
the form Bt | €, viz.
|U|d
where / and d stand for a letter and digit respectively.
Type-3. grammars are also known as linear grammars or regular grammars.
These are further categorized into lefi-linear and right-linear grammars depending
on whether the NT in the RHS altemative appears at the extreme left or extreme
right.
Operator grammars
Definition 1.7 (Operator grammar (OG)) An operator grammar is a grammar none of
whose productions contain two or more consecutive NTs in any RHS alternative.
Thus, nonterminals occurring in an RHS string are separated by one or more
terminal symbols, All terminal symbols occurring in the RHS strings are calledLanguage Processors 27
operators of the grammar. As we will discuss later in Chapter 6, OGs have certain
practical advantages in compiler writing.
Example 1.19 Grammar (1.3) is an OG.
‘grammar,
("and *Y’ are the operators of the
1.4.1.2 Ambiguity in Grammatic Specification
Ambiguity implies the possibility of different interpretations of a source string. In
natural languages, ambiguity may concern the meaning or syntax category of a word,
or the syntactic structure of a construct. For example, a word can have multiple
meanings or can be both noun and verb (c.g. the word ‘base’), and a sentence can
have multiple syntactic structures (e.g. ‘police ordered to stop speeding on roads’).
Formal language grammars avoid ambiguity at the level of a lexical unit or a syntax
category. This is achieved by the simple rule that identical strings cannot appear on
the RHS of more than one production in the grammar, Existence of ambiguity at the
level of the syntactic structure of a string would mean that more than one parse tree
can be built for the string. In turn, this would mean that the string can have more
than one meaning associated with it.
Example 1.20 Consider the expression grammar
| + cexp>| *
alble (sy
‘Two parse trees exist for the source siring a*bec according to this grammar —one
in which a+b is first reduced to and another in which bc is first reduced t0
. Since semantic analysis derives the meaning of a string on the basis of its
parse tree, clearly two different meanings can be associated with the string,
Eliminating ambiguity
An ambiguous grammar should be rewritten to eliminate ambiguity. In Ex. 1.20, the
first tree does not reflect the conventional meaning associated with atb+c, while the
second tree does. Hence the grammar must be rewritten such that reduction of ‘"
precedes the reduction of ‘+’ in a+b*c. The normal method of achieving this is to
use a hierarchy of NTs in the grammar, and to associate the reduction or derivation
of an operator with an appropriate NT.
Example 1.21 Figure 1.20 illustrates reduction of atb+c according to Grammar 1.3. Part (a)
depicts an attempt to reduce atb to . This attempt fails because the resulting
string * cannot be reduced to . Part (b) depicts the correct reduc-
tion of atbée in which béc is first reduced to . This sequence of reductions
can be explained as follows: Grammar (1.3) associates the recognition of -** with
reduction of a string toa , which alone can take part in a reduction involving
“+. Consequently, in atb*c, “+” has to be necessarily reduced before “+”. This yields
the conventional meaning of the string. Other NTs, viz. < factor > and < primary >.
similarly take care of the operator *t" and the parentheses *(...)’. Hence there is no
ambiguity in grammar (1.3).28 Systems Programming & Operating Systems
c\
T
4
1 1
F E
1 '
PB PB
wo n-a-m
i 1 I I t
+* +*
fa) (b)
Fig, 1.20 Ensuring o unique parse tree for an expression
EXERCISE 1.4
. In grammar (1.3), identify productions which could belong to
(a) an operator grammar.
(b) alinear grammar.
2, Write productions for the following
(a) adecimal constant with or without a fractional part,
(b) areal number with mantissa and exponent specification,
3, In grammar (1.3) what are the priorities of ‘+", “ and ‘t" with respect to one another?
4, In grammar (1.3) add productions to incorporate relational and boolean operators.
5. Associativity of an operator indicates the order in which consecutive occurrences of
the operator in a string are reduced, For example ‘+ is left associative, i.e. in atb+c,
ath is performed first, followed by the addition of c to its result.
(a) Find the associativities of operators in grammar (1.3).
(>) Exponentiation should be right associative so that atbttc has the conventional
meaning:a", What changes should be made in grammar (1.3) to implement
right associativity for ¢?
{c) Is the grammar.of problem 3 of Exercise 3.2.2 ambiguous? If so, give a suing
which has multiple parses.
1.4.2 Binding and Binding Times
Each program entity pe; in program P has a set of attributes Aj = {aj} associated
with it. If pe; is an identifier, it has an attribute kind whose value indicates whether
it is a variable, a procedure or a reserved identifier (ic. a keyword). A variable
has attributes like type, dimensionality, scope, memory address, etc. Note that the
attribute of one program entity may be another program entity. For example, type is