Learning Materials, CD, Unit-1 (Btech-5th Sem)
Learning Materials, CD, Unit-1 (Btech-5th Sem)
LEARNING MATERIALS
Department : Computer Science & Engineering Semester : 5TH
UNIT : (1) Introduction to Compiling [3L] Course Code : Contact Periods : 3 Hrs
[Instructions : At the end of this lesson, teacher is asked to give MCQ / Short type questions / Broad type
Questions related to this Unit]
__________________________________________________________________________
Content:
1.Statement of a problem - Recognizing basic elements; Recognizing Syntactic units and Interpreting
meaning; Intermediate from: Arithmetic statements, Non-Arithmetic statement, Non-executable statements;
Storage Allocation;
Code Generation: Optimization(M/c independent), Optimization(M/c dependent); Assembly Phase; General
Model of Compiler.
2.Phases of Compiler
Compilers, Analysis of the source program, The phases of the compiler, Cousins of the compiler.
………………………………………………………………………………………………………………………………..
1. Compilers
A compiler is a program that translates code written in a high-level programming language into machine code
or an intermediate language. The compiler ensures the code adheres to the syntax and semantics of the source
language and optimizes it for efficient execution on a target machine.
Source Program Analysis involves breaking down the input program into a form that the compiler can process.
This step includes:
o Lexical Analysis: Tokenizes the input code by dividing it into meaningful symbols like keywords, opera-
tors, and identifiers.
o Syntax Analysis: Checks the source code structure to ensure it follows the grammar rules of the pro-
gramming language.
o Semantic Analysis: Verifies that the source code’s meanings are correct, such as type-checking and en-
suring variable usage adheres to language rules.
Compilers are generally divided into two main parts: analysis and synthesis. These are further broken down in-
to specific phases:
1. Lexical Analysis: Tokenizes the input, turning it into symbols the compiler can understand.
2. Syntax Analysis: Constructs a parse tree from tokens, checking syntax against language grammar.
3. Semantic Analysis: Checks the meaning of the syntax, including type-checking.
4. Intermediate Code Generation: Transforms the code into an intermediate representation that ab-
stracts machine-specific details.
5. Code Optimization: Enhances the intermediate code for efficiency in memory and execution time.
6. Code Generation: Converts the optimized intermediate code into machine code.
7. Code Linking and Loading: Merges code from libraries and modules, preparing it for execution.
The phases of a compiler transform high-level code into machine code in a systematic, multi-step process. Each
phase has a specific function and collectively ensures the correctness, optimization, and efficient execution of
code.
1. Lexical Analysis
Purpose: Breaks down the source code into tokens (the smallest units like keywords, identifiers, literals, and
symbols).
Process: The lexical analyzer reads the input source code character-by-character and converts it into a stream
of tokens. Each token has a type (such as identifier, number, or operator) and value (e.g., the identifier name or
numeric value).
Output: A sequence of tokens that represent the source code.
Purpose: Checks the source code’s grammatical structure based on the rules of the programming language.
Process: The parser takes the tokens from the lexical analyzer and arranges them into a parse tree or syntax
tree. This tree represents the hierarchical structure of statements in the code.
Output: A syntax tree that shows the structure and nested relationships of the tokens.
3. Semantic Analysis
Purpose: Ensures the code adheres to the language’s semantic rules, focusing on the meaning rather than the
form.
Process: During semantic analysis, the compiler checks for correct type usage, identifier scope, array bounds,
and function calls with the correct number and type of arguments. It often generates an abstract syntax tree
(AST) from the syntax tree, adding semantic information.
Output: An annotated syntax tree or AST that incorporates semantic checks and annotations.
Purpose: Converts the syntax tree or AST into an intermediate representation (IR) that abstracts away machine-
specific details.
Process: The compiler produces code that is easy to optimize and translates it into a form that is neither high-
level nor machine code. The IR is designed to be flexible, with common forms including three-address code or
control flow graphs.
Output: Intermediate code that is simpler than source code and portable across multiple architectures.
5. Code Optimization
Purpose: Enhances the intermediate code to improve execution efficiency, memory usage, or both.
Process: Optimization can be local (applied to small sections of code) or global (applied across the entire pro-
gram). Techniques include constant folding, dead code elimination, loop unrolling, and inline expansion. This
step balances improving performance with preserving program correctness.
Output: Optimized intermediate code.
6. Code Generation
Purpose: Converts the optimized intermediate code into machine code or assembly language for a specific tar-
get architecture.
Process: The code generator maps intermediate code operations to machine instructions. It selects appropriate
registers, assigns memory locations, and generates actual machine instructions that the hardware can execute.
The generated code is in binary or assembly format, depending on the compiler design.
Output: Target-specific machine code or assembly code.
Purpose: Combines the generated code with libraries and prepares the program for execution.
Process: The linker takes the generated machine code, adds external libraries or modules, and resolves function
calls and variables from external files. This is especially important in languages like C and C++ where modular
programming relies on separate compilation of modules.
Output: An executable program that is ready to load and execute.
The cousins of the compiler are system software tools related to the compiler in function and purpose. They
each contribute to the process of transforming and executing code but operate differently. Here are the main
cousins:
1. Preprocessor
o The preprocessor processes the source code before it reaches the compiler. It handles tasks like
macro expansion, file inclusion (e.g., #include in C/C++), and conditional compilation (#ifdef
and #endif).
o It modifies the source code based on directives and then passes the modified code to the compil-
er.
2. Assembler
o The assembler converts assembly language code into machine code or object code.
o Assemblers are crucial in converting low-level instructions written in assembly into binary code
that the computer can execute directly.
3. Interpreter
o An interpreter executes a program line-by-line or statement-by-statement without converting
the entire code to machine code in advance.
o Unlike a compiler, an interpreter translates high-level code on the fly, which makes it useful for
debugging but generally slower for execution compared to compiled programs.
4. Linker
o The linker takes multiple object files (often generated by the compiler) and combines them into
a single executable program.
o It resolves references between different modules, combining library code and ensuring all neces-
sary resources are available in the final executable.
5. Loader
o The loader places the executable program into memory and prepares it for execution.
o It handles memory allocation, linking of shared libraries (dynamic linking), and setting up the
environment so the program can run smoothly on the operating system.
Intro:
Compiler: A compiler is a magic box that converts the high level language program into machine
language program.
OR
A compiler is a software program that converts high-level language into a machine language,
which can be executed by a computer.
High Machine
Compiler
Level Language Level Language
Page 1
i. Lexical analysis: Recognition of basics element or tokens and creation of uniform symbols.
ii. Syntax analyses: Recognition of basics syntactic construct through reduction table.
iii. Interpretation phases: It describes the definition of exact meaning, creation of matrix
and tables by action routines.
iv. Machine independent optimization: Creation of more optimal matrix by removing the
duplicate entries in the matrix table.
v. Storage assignment: It makes entries in the matrix that allow code generation to create
code that allocates dynamic storage and also the assembly phase to reserve the proper
amount of storage.
vi. Code generation: A macro processor is used to produce more optimal assembly code.
vii. Assembly and Output: It resolving symbolic address and generating the machine language.
iii. Literal table: It describes all literals constants used in the source program. It consists of 6 fields:
Literals Base Scale Precision Other information Address
31 Decimal Fixed 2
2 Decimal Fixed 1
100 decimal fixed 3
iv. Identifier Table: It describes all identifiers used in the source program. It consists of three fields
Name Data attribute Address
Algorithm:
Implementation:
i. The input string is separated into tokens by break character. Brake characters are de-
noted by the contents of a special field in the terminal table
ii. Lexical analysis 3 types of tokens: Terminal symbols[TRM], Identifiers [IDN],Literals [LIT]
iii. if symbol== TERMINAL table then
Create Uniform Symbol Table of type TRM
else if symbol==IDENTIFIER table then
Create Uniform Symbol Table of type
IDN else End if
ii. Stack: The stack is a collection of uniform symbol i.e., currently being worked on the
stack is organized in LIFO technique.
iii. Reduction table: The syntax rules of the source language are contained in the reduction
table The general form of the reduction or rules is:-
Label: old top stack/ action routine/ new top stack/ next reduction
Step 1: place the matrix in a form so that common sub expression can be recognized
Step 2: Recognize two sub expression as being equivalent
Step 3: Eliminate one of them
Step 4: Alter the rest of the matrix to reflect the elimination of this entry
For ex:
B=A
A=C*D*(C*D+B)
Step1:
Operator Operand 1 Operand 2 Backward Forward
pointer pointer
M1 = B A 0 2
M2 * C D 1 3
M3 + M2 B 2 4
M4 * C D 3 5
M5 * M4 M3 4 6
M6 = M5 A 5 ?
The literal table similarly scanned and locations are assigned to each literal and a matrix entry
LIT Size Operand
For ex: A = B + C - D
Matrix Original Code Better Code
M1 + B C L 1, B L 1, B
A 1, C A 1, C
ST 1, M1
M2 - M1 D L 1, M1
S 1, D S 1, D
ST 1, M2
M3 = M2 A L 1, M2
ST 1, A ST 1, A
5. 4 Passes of a Compiler
The following diagram depicts a flowchart of a compiler.
Pass1: It corresponds to the lexical analysis of a compiler. It scans the source program
and creates the identifiers, literals and uniform symbol tables.
Pass2: It corresponds to syntax and interpretation phases. Pass2 scans the uniform symbol table
produces the matrix.
Pass3 through Pass N-3 means Pass4: They correspond to the optimization phase.
Pass N-2: Pass 5: It corresponds to the storage assignment phase.
Pass N-1: Pass 6: It corresponds to code generation phase. It scans the matrix.
Pass N: Pass 7: It corresponds to Assembly and output phase.
LIST OF COMPILERS