0% found this document useful (0 votes)
27 views12 pages

Learning Materials, CD, Unit-1 (Btech-5th Sem)

Ok, learning management, makaut, ok, lab, complete, college

Uploaded by

banakumar ghosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views12 pages

Learning Materials, CD, Unit-1 (Btech-5th Sem)

Ok, learning management, makaut, ok, lab, complete, college

Uploaded by

banakumar ghosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

DH SIRS CLASSROOM

LEARNING MATERIALS
Department : Computer Science & Engineering Semester : 5TH

Subject : COMPILER DESIGN

UNIT : (1) Introduction to Compiling [3L] Course Code : Contact Periods : 3 Hrs

Teacher : Debasish Hati Date : ________________ ____

[Instructions : At the end of this lesson, teacher is asked to give MCQ / Short type questions / Broad type
Questions related to this Unit]
__________________________________________________________________________
Content:

1.Statement of a problem - Recognizing basic elements; Recognizing Syntactic units and Interpreting
meaning; Intermediate from: Arithmetic statements, Non-Arithmetic statement, Non-executable statements;
Storage Allocation;
Code Generation: Optimization(M/c independent), Optimization(M/c dependent); Assembly Phase; General
Model of Compiler.
2.Phases of Compiler

Compilers, Analysis of the source program, The phases of the compiler, Cousins of the compiler.
………………………………………………………………………………………………………………………………..

1. Compilers

 A compiler is a program that translates code written in a high-level programming language into machine code
or an intermediate language. The compiler ensures the code adheres to the syntax and semantics of the source
language and optimizes it for efficient execution on a target machine.

2. Analysis of the Source Program

 Source Program Analysis involves breaking down the input program into a form that the compiler can process.
This step includes:
o Lexical Analysis: Tokenizes the input code by dividing it into meaningful symbols like keywords, opera-
tors, and identifiers.
o Syntax Analysis: Checks the source code structure to ensure it follows the grammar rules of the pro-
gramming language.
o Semantic Analysis: Verifies that the source code’s meanings are correct, such as type-checking and en-
suring variable usage adheres to language rules.

3. Phases of the Compiler

 Compilers are generally divided into two main parts: analysis and synthesis. These are further broken down in-
to specific phases:
1. Lexical Analysis: Tokenizes the input, turning it into symbols the compiler can understand.
2. Syntax Analysis: Constructs a parse tree from tokens, checking syntax against language grammar.
3. Semantic Analysis: Checks the meaning of the syntax, including type-checking.
4. Intermediate Code Generation: Transforms the code into an intermediate representation that ab-
stracts machine-specific details.
5. Code Optimization: Enhances the intermediate code for efficiency in memory and execution time.
6. Code Generation: Converts the optimized intermediate code into machine code.
7. Code Linking and Loading: Merges code from libraries and modules, preparing it for execution.

The phases of a compiler transform high-level code into machine code in a systematic, multi-step process. Each
phase has a specific function and collectively ensures the correctness, optimization, and efficient execution of
code.

1. Lexical Analysis

 Purpose: Breaks down the source code into tokens (the smallest units like keywords, identifiers, literals, and
symbols).
 Process: The lexical analyzer reads the input source code character-by-character and converts it into a stream
of tokens. Each token has a type (such as identifier, number, or operator) and value (e.g., the identifier name or
numeric value).
 Output: A sequence of tokens that represent the source code.

2. Syntax Analysis (Parsing)

 Purpose: Checks the source code’s grammatical structure based on the rules of the programming language.
 Process: The parser takes the tokens from the lexical analyzer and arranges them into a parse tree or syntax
tree. This tree represents the hierarchical structure of statements in the code.
 Output: A syntax tree that shows the structure and nested relationships of the tokens.

3. Semantic Analysis

 Purpose: Ensures the code adheres to the language’s semantic rules, focusing on the meaning rather than the
form.
 Process: During semantic analysis, the compiler checks for correct type usage, identifier scope, array bounds,
and function calls with the correct number and type of arguments. It often generates an abstract syntax tree
(AST) from the syntax tree, adding semantic information.
 Output: An annotated syntax tree or AST that incorporates semantic checks and annotations.

4. Intermediate Code Generation

 Purpose: Converts the syntax tree or AST into an intermediate representation (IR) that abstracts away machine-
specific details.
 Process: The compiler produces code that is easy to optimize and translates it into a form that is neither high-
level nor machine code. The IR is designed to be flexible, with common forms including three-address code or
control flow graphs.
 Output: Intermediate code that is simpler than source code and portable across multiple architectures.

5. Code Optimization

 Purpose: Enhances the intermediate code to improve execution efficiency, memory usage, or both.
 Process: Optimization can be local (applied to small sections of code) or global (applied across the entire pro-
gram). Techniques include constant folding, dead code elimination, loop unrolling, and inline expansion. This
step balances improving performance with preserving program correctness.
 Output: Optimized intermediate code.
6. Code Generation

 Purpose: Converts the optimized intermediate code into machine code or assembly language for a specific tar-
get architecture.
 Process: The code generator maps intermediate code operations to machine instructions. It selects appropriate
registers, assigns memory locations, and generates actual machine instructions that the hardware can execute.
The generated code is in binary or assembly format, depending on the compiler design.
 Output: Target-specific machine code or assembly code.

7. Code Linking and Loading (Optional)

 Purpose: Combines the generated code with libraries and prepares the program for execution.
 Process: The linker takes the generated machine code, adds external libraries or modules, and resolves function
calls and variables from external files. This is especially important in languages like C and C++ where modular
programming relies on separate compilation of modules.
 Output: An executable program that is ready to load and execute.

4. Cousins of the Compiler

The cousins of the compiler are system software tools related to the compiler in function and purpose. They
each contribute to the process of transforming and executing code but operate differently. Here are the main
cousins:

1. Preprocessor
o The preprocessor processes the source code before it reaches the compiler. It handles tasks like
macro expansion, file inclusion (e.g., #include in C/C++), and conditional compilation (#ifdef
and #endif).
o It modifies the source code based on directives and then passes the modified code to the compil-
er.
2. Assembler
o The assembler converts assembly language code into machine code or object code.
o Assemblers are crucial in converting low-level instructions written in assembly into binary code
that the computer can execute directly.
3. Interpreter
o An interpreter executes a program line-by-line or statement-by-statement without converting
the entire code to machine code in advance.
o Unlike a compiler, an interpreter translates high-level code on the fly, which makes it useful for
debugging but generally slower for execution compared to compiled programs.
4. Linker
o The linker takes multiple object files (often generated by the compiler) and combines them into
a single executable program.
o It resolves references between different modules, combining library code and ensuring all neces-
sary resources are available in the final executable.
5. Loader
o The loader places the executable program into memory and prepares it for execution.
o It handles memory allocation, linking of shared libraries (dynamic linking), and setting up the
environment so the program can run smoothly on the operating system.

Intro:

Compiler: A compiler is a magic box that converts the high level language program into machine
language program.
OR
A compiler is a software program that converts high-level language into a machine language,
which can be executed by a computer.
High Machine
Compiler
Level Language Level Language

5.1 General Model of Complier or Simple Structure of Compiler


The compilation process is a sequence of various phases. Each phase takes input from its previous
stage, has its own representation of source program, and feeds its output to the next phase of the
compiler. The general model of a compiler consists of 7 distinct phases:
1. Lexical analysis
2. Syntax analysis
3. Integration phase
4. Machine independent optimization
5. Storage assignment
6. Code generation
7. Assembly and output

Page 1
i. Lexical analysis: Recognition of basics element or tokens and creation of uniform symbols.
ii. Syntax analyses: Recognition of basics syntactic construct through reduction table.
iii. Interpretation phases: It describes the definition of exact meaning, creation of matrix
and tables by action routines.
iv. Machine independent optimization: Creation of more optimal matrix by removing the
duplicate entries in the matrix table.
v. Storage assignment: It makes entries in the matrix that allow code generation to create
code that allocates dynamic storage and also the assembly phase to reserve the proper
amount of storage.
vi. Code generation: A macro processor is used to produce more optimal assembly code.
vii. Assembly and Output: It resolving symbolic address and generating the machine language.

5.2 The database used


i. Source code: The program written by user or the user program
ii. Uniform symbol table: It consist list of all the tokens or basic elements as they appear
in the program created by lexical analysis phase and given as input syntax analysis and
interpretation phase
iii. Terminal table: This table is created by lexical analysis phase and contains all variable in
the program
iv. Identifier table: It contains all variable in the program and temporary storage and infor-
mation needed to reference allocate storage for the variables. This table is created by
lexical analysis.
v. Literal tables: It contains all contents in the program
vi. Reductions: It is a permanent table of decision rules in the form of pattern for matching
with the uniform symbols table to discover synthetic structure
vii. Matrix: Matrix is created by the intermediate form of the program which is created
by the action routine. It is optimized and then used for code generation
viii. Code productions: It is permanent table of definition. There is one entry defining code
for each matrix operator
ix. Assembly code: The assembly language variation of the program which is created by the
code generation phase and it is input to the assembly phase
x. Re-locatable object codes: The final output of the assembly phase ready to be used as
input to loader

Consider a simple example


WCM: procedure (Rate, Start, finish);
Declare (Cost, Rate, Start, Finish) fixed binary (31) static;
Cost=Rate *(Start- Finish) +2*Rate*(Start-Finish-100);
Return (Cost);
End;

5.3.1 Lexical Analysis Phase


The lexical phase performs the following three tasks:
1. Recognize basic elements are tokens present in the source code

Compiler Design | UNIT-1 | Prepared By: Debasish Hati, In-charge, DCST


2. Build literal and an identifier table
3. Build a uniform symbol table

Recognizing the basic elements- Tokens of example program

Database: Lexical phase involves the manipulation of 5 databases


i. Source program
ii. Terminal table
iii. Literal table
iv. Identifier table
v. Uniform symbol table
i. Source program: The original form of the program created by the user
ii. Terminal Table: It is a permanent database it consist of 3 fields
 Symbol: operators, keywords and separators [(,;,:]
 Indicators: values are YES or
NO Yes=> operators, separa-
tors No=> Keywords
 Precedence: Used in later phase
Step Symbol Indicator Precedence
1 : Yes
2 ; Yes
3 ( Yes
4 ) Yes
5 , Yes
6 * Yes
7 Declare No
8 Procedure No
9 + Yes
10 * Yes
11 Rate No
12 Start No

iii. Literal table: It describes all literals constants used in the source program. It consists of 6 fields:
Literals Base Scale Precision Other information Address
31 Decimal Fixed 2
2 Decimal Fixed 1
100 decimal fixed 3
iv. Identifier Table: It describes all identifiers used in the source program. It consists of three fields
Name Data attribute Address

Compiler Design | UNIT-1 | Prepared By: Debasish Hati, In-charge, DCST


WCM
RATE
START
FINISH
COST
v. Uniform symbol tables: It consist list of all the tokens or basic elements as they appear in
the program created by lexical analysis phase. There is one uniform symbol for every token
in the program. It consists of 2 fields:
Table class Index Token
IDN 1 WCM
TRM 1 :
TRM 8 Procedure
TRM 3 (
IDN 2 Rate
TRM 5 ,
IDN 3 Start
TRM 5 ,
IDN 4 Finish
TRM 4 )
TRM 2 ;

Algorithm:

Step1: Parse the input character string into tokens


Step2: Make appropriate entries in to the table

Implementation:

i. The input string is separated into tokens by break character. Brake characters are de-
noted by the contents of a special field in the terminal table
ii. Lexical analysis 3 types of tokens: Terminal symbols[TRM], Identifiers [IDN],Literals [LIT]
iii. if symbol== TERMINAL table then
Create Uniform Symbol Table of type TRM
else if symbol==IDENTIFIER table then
Create Uniform Symbol Table of type
IDN else End if

Create Uniform Symbol Table of type LIT

5.3.2 Syntax Phase:


The functions of the syntax phase are
1. To recognize the major construct of the language
2. To call the appropriate action routines that will generate the intermediate form or
matrix form the constructs
Databases: The Syntax analysis phase involves the manipulation of 3 databases
i. Uniform symbol table: The table created a by lexical phase. The uniform symbols are the
source of input to the stack which is used by syntax and interpretation phase

Compiler Design | UNIT-1 | Prepared By: Debasish Hati, In-charge, DCST


Table classes Index

ii. Stack: The stack is a collection of uniform symbol i.e., currently being worked on the
stack is organized in LIFO technique.

iii. Reduction table: The syntax rules of the source language are contained in the reduction
table The general form of the reduction or rules is:-
Label: old top stack/ action routine/ new top stack/ next reduction

5.3.3 Interpretation Phase:


Interpretation phase is a collection of routines that are called when a constructs recognized.
The purpose of action routines is to create an intermediate form of the source program and
add the information to the identifier. The interpretation phase interprets the precise meaning in-
to the matrix or identifier table while syntax phase recognize the syntactic constructs.
Databases:

i. Uniform symbol table


ii. Identifier table
iii. Stack
iv. Matrix: it is primary intermediate form of the program. A matrix entry consists of a tri-
plet entry where the first element is a uniform symbol denoting the terminal symbol of
operator and other two element are uniform symbols denoting the arguments.
Operator Operand 1 Operand 2
For ex:
B=A
A=C*D*(C*D+B)
Operator Operand 1 Operand 2
M1 = B A
M2 * C D
M3 + M2 B
M4 * C D
M5 * M4 M3
M6 = M5 A
5.3.4 Optimization Phase:
Removing or deleting the duplicate entries in the matrix and modifying all references to the delet-
ed entries is called optimization. Optimization dependent by a compiler are of two types. They
are
i. Machine dependent optimization is related to the instructions that get generated.
So it is incorporated into the code generation phase.
ii. Machine independent optimization is done at separated phase
Databases

i. Matrix: This is the major database in the optimization phase


Operator Operand 1 Operand 2 Backward Forward
pointer pointer

Compiler Design | UNIT-1 | Prepared By: Debasish Hati, In-charge, DCST


ii. Identifier table
iii. Literal table
Algorithm:

Step 1: place the matrix in a form so that common sub expression can be recognized
Step 2: Recognize two sub expression as being equivalent
Step 3: Eliminate one of them
Step 4: Alter the rest of the matrix to reflect the elimination of this entry

For ex:
B=A
A=C*D*(C*D+B)
Step1:
Operator Operand 1 Operand 2 Backward Forward
pointer pointer
M1 = B A 0 2
M2 * C D 1 3
M3 + M2 B 2 4
M4 * C D 3 5
M5 * M4 M3 4 6
M6 = M5 A 5 ?

Step 2: Step 3 & 4:


Opr Op1 Op2 Bk. Fr.
Opr Op1 Op2 Bk. Fr.
ptr Ptr
ptr Ptr
M1 = B A 0 2
M1 = B A 0 2
M2 * C D 1 3
M2 * C D 1 3
M3 + M2 B 2 4
M3 + M2 B 2 4
M4 * C D 3 5
M4 * M2 M3 3 5
M5 * M4 M3 4 6
M5 = M5 A 4 ?
M6 = M5 A 5 ?

5.3.5 Storage Assignment Phase:


The purpose of this phase is to
i. Assign storage to all variables referenced in the source program
ii. Assign storage to all literals
iii. Assign storage to all temporary locations for intermediate results
iv. Ensure that the storage is allocated and appropriate locations are initialized
The storage assignment phase first scan the identifier table assigns locations to entry with a
storage class of static or automatic. Initialize the location counter to zero and also keep track of
how much storage it has assigned. For each scanning this phase do the following steps:
i. Updates the location counter with boundary alignment
ii. Assigns the current value of location counter to the address field
iii. Calculate the length of storage required by the variable
iv. Updates the location counter by adding this length to it.
The storage allocation creates a matrix entry for varibles as shown below

Compiler Design | UNIT-1 | Prepared By: Debasish Hati, In-charge, DCST


Storage class Size Operand
Where, Storage classes are: Static, Automatic, Controlled, Base
For each variable that required initialization, the storage allocation phase generates matrix en-
try as shown below
Initialize variable Operand

The literal table similarly scanned and locations are assigned to each literal and a matrix entry
LIT Size Operand

5.3.6 Code Generation Phase:


The Purpose of the code generation is to produce appropriate code in the form of either assem-
bly or machine language. In this phase Matrix is the input data base and uses the code produc-
tion table which defines the operators that may appeared in the matrix to produce code.
Data bases:
i. Matrix
ii. Identifier table
iii. Literal table
iv. Code productions: it is a permanent database defining all possible matrix operators.
The standard code for operators is:
L 1, &operand1
+ A 1, &operand2 * L 1, &operand1
M 1, &operand2
ST 1, &N ST 1, &N
L 1, &operand1
- S 1, &operand2 = L 1, &operand2
ST 1, &Operand 1
ST 1, &N

For ex: A = B + C - D
Matrix Original Code Better Code
M1 + B C L 1, B L 1, B
A 1, C A 1, C
ST 1, M1
M2 - M1 D L 1, M1
S 1, D S 1, D
ST 1, M2
M3 = M2 A L 1, M2
ST 1, A ST 1, A

5.3.7 Assembly Phase:


The task of assembly phase depends on how much has been done in the code generation phase.
The assembly phase must do
i. Resolve label reference in the object program
ii. Calculate address
iii. Generate machine language instructions
iv. Generate storage and literals
Compiler Design | UNIT-1 | Prepared By: Debasish Hati, In-charge, DCST
v. Format the appropriate information for the loader
Databases:
i. Identifier Table
ii. Literal table
iii. Object code
Algorithm: The assembly phase
Step 1: Scans the object code to resolving all label references and producing TXT
cards Step2: Then scans the identifier table to create ESD (External Symbol Direc-
tory) cards Step 3: Using TXT cards and ESD cards create RLD (ReLocation Direc-
tory) cards.

5. 4 Passes of a Compiler
The following diagram depicts a flowchart of a compiler.
Pass1: It corresponds to the lexical analysis of a compiler. It scans the source program
and creates the identifiers, literals and uniform symbol tables.
Pass2: It corresponds to syntax and interpretation phases. Pass2 scans the uniform symbol table
produces the matrix.
Pass3 through Pass N-3 means Pass4: They correspond to the optimization phase.
Pass N-2: Pass 5: It corresponds to the storage assignment phase.
Pass N-1: Pass 6: It corresponds to code generation phase. It scans the matrix.
Pass N: Pass 7: It corresponds to Assembly and output phase.

Fig: Passes of compiler

LIST OF COMPILERS

Compiler Design | UNIT-1 | Prepared By: Debasish Hati, In-charge, DCST


i. Ada compilers
ii. ALGOL compilers
iii. BASIC compilers
iv. C# compilers
v. C compilers
vi. C++ compilers
vii. COBOL compilers
viii. Common Lisp compilers
ix. ECMAScript interpreters
x. Fortran compilers
xi. Java compilers
xii. Pascal compilers
xiii. PL/I compilers
xiv. Python compilers
xv. Smalltalk compilers

Expected Question from Unit -5 for the Examination

ONE Marks Questions


1. What is compiler?
2. What is lexical analysis
3. Define source program
4. What is optimization
5. Mention three tasks of lexical analysis phase

THREE Marks Questions


1. What are tokens? Give an example
2. Explain interpretation phase
3. Explain storage assignment phase

FIVE Marks Questions


1. Explain code generation phase with an example
2. With an example explain optimization phase

SEVEN Marks Questions


1. With neat diagram explain General model (Structure or Block diagram ) of compiler
2. Explain the databases used in compiler design
3. With an example explain lexical analysis phase

Compiler Design | UNIT-1 | Prepared By: Debasish Hati, In-charge, DCST

You might also like