0% found this document useful (0 votes)
22 views21 pages

5CAI4-02 Compiler Design

Uploaded by

kamaldkd09
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views21 pages

5CAI4-02 Compiler Design

Uploaded by

kamaldkd09
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

5CAI4-02: Compiler Design

1 Introduction: Objective, scope and outcome of the course. 01

2 Introduction: Objective, scope and outcome of the course. Compiler, Translator, 06


Interpreter definition, Phase of compiler, Bootstrapping, Review of Finite automata
lexical analyzer, Input, Recognition of tokens, Idea about LEX: A lexical analyzer
generator, Error handling.

3 Review of CFG Ambiguity of grammars: Introduction to parsing. Top-down 10


parsing, LL grammars & passers error handling of LL parser, Recursive descent
parsing predictive parsers, bottom-up parsing, Shift reduce parsing, LR parsers,
Construction of SLR, Conical LR & LALR parsing tables, parsing with ambiguous
grammar. Operator precedence parsing, Introduction of automatic
parser generator: YACC error handling in LR parsers.

4 Syntax directed definitions; Construction of syntax trees, S- Attributed Definition, 10


L-attributed definitions, Top-down translation. Intermediate code forms using
postfix notation, DAG, three address code, TAC for various control structures,
Representing TAC using triples and quadruples, Boolean
expression and control structures.

5 Storage organization; Storage allocation, Strategies, Activation records, accessing 08


local and non-local names in a block structured language, Parameters passing,
Symbol table organization, Data
structures used in symbol tables.

6 Definition of basic block control flow graphs; DAG representation of basic block, 07
Advantages of DAG, Sources of optimization, Loop optimization, Idea about global
data flow analysis, Loop invariant computation, Peephole optimization, Issues in
design of code generator, A simple code generator, Code
generation from DAG.

Total 42

Compiler Translator, Interpreter definition:


1. Compiler
The language processor that reads the complete source program written in high-level language as a
whole in one go and translates it into an equivalent program in machine language is called a
Compiler. Example: C, C++, C#.
In a compiler, the source code is translated to object code successfully if it is free of errors. The compiler
specifies the errors at the end of the compilation with line numbers when there are any errors in the
source code. The errors must be removed before the compiler can successfully recompile the source
code again the object program can be executed number of times without translating it again.
2. Assembler
The Assembler is used to translate the program written in Assembly language into machine code. The
source program is an input of an assembler that contains assembly language instructions. The output
generated by the assembler is the object code or machine code understandable by the computer.
Assembler is basically the 1st interface that is able to communicate humans with the machine. We need
an assembler to fill the gap between human and machine so that they can communicate with each other.
code written in assembly language is some sort of mnemonics(instructions) like ADD, MUL, MUX,
SUB, DIV, MOV and so on. and the assembler is basically able to convert these mnemonics in binary
code. Here, these mnemonics also depend upon the architecture of the machine.
For example, the architecture of intel 8085 and intel 8086 are different.

3. Interpreter
The translation of a single statement of the source program into machine code is done by a language
processor and executes immediately before moving on to the next line is called an interpreter. If there
is an error in the statement, the interpreter terminates its translating process at that statement and
displays an error message. The interpreter moves on to the next line for execution only after the removal
of the error. An Interpreter directly executes instructions written in a programming or scripting
language without previously converting them to an object code or machine code. An interpreter
translates one line at a time and then executes it.
Example: Perl, Python and Matlab.
Phase of compiler

A compiler is a software tool that converts high-level programming code into machine code that a
computer can understand and execute. It acts as a bridge between human-readable code and machine-
level instructions, enabling efficient program execution. The process of compilation is divided into six
phases:
1. Lexical Analysis: The first phase, where the source code is broken down into tokens such as
keywords, operators, and identifiers for easier processing.
2. Syntax Analysis or Parsing: This phase checks if the source code follows the correct syntax
rules, building a parse tree or abstract syntax tree (AST).
3. Semantic Analysis: It ensures the program’s logic makes sense, checking for errors like type
mismatches or undeclared variables.
4. Intermediate Code Generation: In this phase, the compiler converts the source code into an
intermediate, machine-independent representation, simplifying optimization and translation.
5. Code Optimization: This phase improves the intermediate code to make it run more efficiently,
reducing resource usage or increasing speed.
6. Target Code Generation: The final phase where the optimized code is translated into the target
machine code or assembly language that can be executed on the computer.
The whole compilation process is divided into two parts, front-end and back-end. These six phases are
divided into two main parts, front-end and back-end with the intermediate code generation phase acting
as a link between them. The front end analyzes source code for syntax and semantics, generating
intermediate code, while ensuring correctness. The back end optimizes this intermediate code and
converts it into efficient machine code for execution. The front end is mostly machine-independent,
while the back end is machine-dependent.
The compilation process is an essential part of transforming high-level source code into machine-
readable code. A compiler performs this transformation through several phases, each with a specific
role in making the code efficient and correct. Broadly, the compilation process can be divided into two
main parts:
1. Analysis Phase: The analysis phase breaks the source program into its basic components and
creates an intermediate representation of the program. It is sometimes referred to as front end.
2. Synthesis Phase: The synthesis phase creates the final target program from the intermediate
representation. It is sometimes referred to as back end.
3.

1. Lexical Analysis
Lexical analysis is the first phase of a compiler, responsible for converting the raw source code into a
sequence of tokens. A token is the smallest unit of meaningful data in a programming language. Lexical
analysis involves scanning the source code, recognizing patterns, and categorizing groups of characters
into distinct tokens.
The lexical analyzer scans the source code character by character, grouping these characters into
meaningful units (tokens) based on the language's syntax rules. These tokens can represent keywords,
identifiers, constants, operators, or punctuation marks. By converting the source code into tokens,
lexical analysis simplifies the process of understanding and processing the code in later stages of
compilation.
Example: int x = 10;
The lexical analyzer would break this line into the following tokens:
int - Keyword token (data type)
x - Identifier token (variable name)
= - Operator token (assignment operator)
10 - Numeric literal token (integer value)
; - Punctuation token (semicolon, used to terminate statements)
Each of these tokens is then passed on to the next phase of the compiler for further processing, such as
syntax analysis.
To know more about Lexical Analysis refer to this article - Lexical Analysis.
2. Syntax Analysis
Syntax analysis, also known as parsing, is the second phase of a compiler where the structure of the
source code is checked. This phase ensures that the code follows the correct grammatical rules of the
programming language.
The role of syntax analysis is to verify that the sequence of tokens produced by the lexical analyzer is
arranged in a valid way according to the language's syntax. It checks whether the code adheres to the
language's rules, such as correct use of operators, keywords, and parentheses. If the source code is not
structured correctly, the syntax analyzer will generate errors.
To represent the structure of the source code, syntax analysis uses parse trees or syntax trees.
 Parse Tree: A parse tree is a tree-like structure that represents the syntactic structure of the
source code. It shows how the tokens relate to each other according to the grammar rules. Each
branch in the tree represents a production rule of the language, and the leaves represent the
tokens.
 Syntax Tree: A syntax tree is a more abstract version of the parse tree. It represents the
hierarchical structure of the source code but with less detail, focusing on the essential syntactic
structure. It helps in understanding how different parts of the code relate to each other.

Parse Tree
To know more about Syntax Analysis refer to this article - Syntax Analysis.
3. Semantic Analysis
Semantic analysis is the phase of the compiler that ensures the source code makes sense logically. It
goes beyond the syntax of the code and checks whether the program has any semantic errors, such as
type mismatches or undeclared variables.
Semantic analysis checks the meaning of the program by validating that the operations performed in
the code are logically correct. This phase ensures that the source code follows the rules of the
programming language in terms of its logic and data usage.
Some key checks performed during semantic analysis include:
 Type Checking: The compiler ensures that operations are performed on compatible data types.
For example, trying to add a string and an integer would be flagged as an error because they
are incompatible types.
 Variable Declaration: It checks whether variables are declared before they are used. For
example, using a variable that has not been defined earlier in the code would result in a semantic
error.
Example:
int a = 5;
float b = 3.5;
a = a + b;
Type Checking:
 a is int and b is float. Adding them (a + b) results in float, which cannot be assigned to int a.
 Error: Type mismatch: cannot assign float to int.
To know more about Semantic Analysis refer to this article - Semantic Analysis.
4. Intermediate Code Generation
Intermediate code is a form of code that lies between the high-level source code and the final machine
code. It is not specific to any particular machine, making it portable and easier to optimize. Intermediate
code acts as a bridge, simplifying the process of converting source code into executable code.
The use of intermediate code plays a crucial role in optimizing the program before it is turned into
machine code.
 Platform Independence: Since the intermediate code is not tied to any specific hardware, it can
be easily optimized for different platforms without needing to recompile the entire source code.
This makes the process more efficient for cross-platform development.
 Simplifying Optimization: Intermediate code simplifies the optimization process by providing
a clearer, more structured view of the program. This makes it easier to apply optimization
techniques such as:
o Dead Code Elimination: Removing parts of the code that don’t affect the program’s
output.
o Loop Optimization: Improving loops to make them run faster or consume less memory.
o Common Subexpression Elimination: Reusing previously calculated values to avoid
redundant calculations.
 Easier Translation: Intermediate code is often closer to machine code, but not specific to any
one machine, making it easier to convert into the target machine code. This step is typically
handled in the back end of the compiler, allowing for smoother and more efficient code
generation.
Example: a = b + c * d;
t1 = c * d
t2 = b + t1
a = t2
To know more about Intermediate Code Generation refer to this article - Intermediate Code Generation.
5. Code Optimization
Code Optimization is the process of improving the intermediate or target code to make the program run
faster, use less memory, or be more efficient, without altering its functionality. It involves techniques
like removing unnecessary computations, reducing redundancy, and reorganizing code to achieve better
performance. Optimization is classified broadly into two types:
 Machine-Independent
 Machine-Dependent
Common Techniques:
 Constant Folding: Precomputing constant expressions.
 Dead Code Elimination: Removing unreachable or unused code.
 Loop Optimization: Improving loop performance through invariant code motion or unrolling.
 Strength Reduction: Replacing expensive operations with simpler ones.
Example:

Code Before Optimization Code After Optimization

for ( int j = 0 ; j < n ; j ++) x=y+z;


{ for ( int j = 0 ; j < n ; j ++)
x=y+z; {
a[j] = 6 x j; a[j] = 6 x j;
} }

To know more about Code Optimization refer to this article - Code Optimization.
6. Code Generation
Code Generation is the final phase of a compiler, where the intermediate representation of the source
program (e.g., three-address code or abstract syntax tree) is translated into machine code or assembly
code. This machine code is specific to the target platform and can be executed directly by the hardware.
The code generated by the compiler is an object code of some lower-level programming language, for
example, assembly language. The source code written in a higher-level language is transformed into a
lower-level language that results in a lower-level object code, which should have the following
minimum properties:
 It should carry the exact meaning of the source code.
 It should be efficient in terms of CPU usage and memory management.
Example:
Three Address Code Assembly Code

LOAD R1, c ; Load the value of 'c' into register R1


LOAD R2, d ; Load the value of 'd' into register R2
MUL R1, R2 ; R1 = c * d, store result in R1
t1 = c * d LOAD R3, b ; Load the value of 'b' into register R3
t2 = b + t1 ADD R3, R1 ; R3 = b + (c * d), store result in R3
a = t2 STORE a, R3 ; Store the final result in variable 'a'

Symbol Table - It is a data structure being used and maintained by the compiler, consisting of all the
identifier's names along with their types. It helps the compiler to function smoothly by finding the
identifiers quickly.
To know more about Symbol Table refer to this article - Symbol Table.
Error Handling in Phases of Compiler
Error Handling refers to the mechanism in each phase of the compiler to detect, report and recover from
errors without terminating the entire compilation process.
 Lexical Analysis: Detects errors in the character stream and ensures valid token formation.
o Example: Identifies illegal characters or invalid tokens (e.g., @var as an identifier).
 Syntax Analysis: Checks for structural or grammatical errors based on the language's grammar.
o Example: Detects missing semicolons or unmatched parentheses.
 Semantic Analysis: Verifies the meaning of the code and ensures it follows language semantics.
o Example: Reports undeclared variables or type mismatches (e.g., adding a string to an
integer).
 Intermediate Code Generation: Ensures the correctness of intermediate representations used in
further stages.
o Example: Detects invalid operations, such as dividing by zero.
 Code Optimization: Ensures that the optimization process doesn’t produce errors or alter code
functionality.
o Example: Identifies issues with unreachable or redundant code.
 Code Generation: Handles errors in generating machine code or allocating resources.
o Example: Reports insufficient registers or invalid machine instructions.
Bootstrapping
Bootstrapping is an important technique in compiler design, where a basic compiler is used to create a
more advanced version of itself. This process helps in building compilers for new programming
languages and improving the ones already in use. By starting with a simple compiler, bootstrapping
allows gradual improvements and makes the compiler more efficient over time.
 Bootstrapping relies on the idea of a self-compiling compiler, where each iteration improves
the compiler's ability to handle more complex code.
 It simplifies the development cycle, allowing incremental improvements and faster deployment
of more robust compilers.
 Many successful programming languages, including C and Java, have used bootstrapping
techniques during their development.

A compiler can be represented using a T Diagram. In this diagram, the source language of the compiler
is positioned at the top-left, the target language (the language produced by the compiler) is placed at the
top-right, and the language in which the compiler is implemented is shown at the bottom.
Working of Bootstrapping
Bootstrapping is the process of creating compilers. It involves a methodology where a slightly more
complicated compiler is created using a simple language (such as assembly language). This slightly
more complicated compiler, in turn, is used to create an even more advanced compiler, and this process
continues until the desired result is achieved.
Here’s a step-by-step look at how bootstrapping works in compiler design:
Step 1: Start with a Basic Compiler
The first step is to create a basic compiler that can handle the most essential features of a programming
language. This simple compiler is often written in assembly language or machine language to make it
easier to build.
Step 2: Use the Basic Compiler to Create a More Advanced Version
Once the basic compiler is ready, it is used to compile a more advanced version of itself. This new
version can handle more complex features, like better error checking and optimizations.
Step 3: Gradually Improve the Compiler
With each new version, the compiler becomes more capable. The process is repeated, and each iteration
adds more features, making the compiler stronger and more efficient.
For example, let's assume a compiler which takes C language as input and generates an assembly
language as an output.
1. To generate this compiler we first write a compiler for a small subset of C language i.e. C0 in assembly
language. Subset of C means C language with less functionality.
Here in the T diagram, source language is subset of C (C0), target language is Assembly language and
implementation language is also Assembly language.
2. Then using the C0 language, we create a compiler for the C language. This compiler C0, takes C
language as source language and generates assembly language as target language as shown below.

By the help bootstrapping we generated compiler for C language in C language itself i.e. self-compiling
compiler.
Cross-Compilation Using Bootstrapping
Cross-compilation is a process where a compiler runs on one platform (host) but generates machine
code for a different platform (target). This is useful when the target platform is not powerful enough to
run the full compiler or when the target architecture is different from the host system. Using
bootstrapping in cross-compilation can help create a compiler that runs on one system (the host) but
produces code for another system (the target).
For example, suppose we want to write a cross compiler for new language X. To create a cross-compiler
for a new language X that generates code in language Z, we start by using an existing compiler Y
(running on machine M) to compile a simple version of language X into language Z. The first step is to
create a basic compiler, XYZ, using Y that translates X code into Z code, all while running on machine
M. This results in a cross-compiler XMZ, which is capable of generating target code in language Z for
the source code in language X, but it runs on machine M. This method allows us to create a compiler
for a new language without needing to run it on

the target system directly.


Advantages
1. Improved Efficiency: Bootstrapping makes the development process faster. Once you have a
basic compiler, you can use it to create more advanced versions, making the entire process of
building a complex compiler much quicker. It’s like building a tool that helps you build better
versions of itself.
2. Portability: Bootstrapping helps create compilers that can work across different systems. Once
you’ve bootstrapped a compiler, it can be used to generate code for various platforms, making
it more flexible and portable.
3. Reduced Dependency: With bootstrapping, you don’t need to rely on other compilers or
external tools. As long as you have a simple starting compiler, you can use it to build more
complex versions and handle all your compiling needs. This reduces the need for other software
or external dependencies.
Challenges and Limitations
1. Initial Effort: Writing the very first version of the compiler is tough. Before bootstrapping can
even begin, you need to create a simple, working compiler from scratch, which requires a lot
of time and effort.
2. Complexity of Self-Compilation: A compiler that can compile itself sounds great, but it’s not
easy to make. Building a self-compiling compiler adds complexity because you have to ensure
that each version can handle more features and still work as expected.
3. Time Consumption: Bootstrapping is not something that happens overnight. It takes time and
resources, especially during the early stages. You have to repeatedly build and improve versions
of the compiler, which can be a slow process in the beginning.
Conclusion
Bootstrapping is a powerful technique in compiler design that enables the creation of advanced
compilers by starting with a simple, basic version. This process allows for gradual improvement and
self-compilation, making the development of complex compilers more efficient over time. It simplifies
the development cycle by enabling incremental improvements, and its ability to generate compilers that
are portable across different systems makes it a valuable tool for creating cross-platform compilers.

Review of Finite automata lexical analyzer:


Lexical analysis, also known as scanning is the first phase of a compiler which involves reading the
source program character by character from left to right and organizing them into tokens. Tokens are
meaningful sequences of characters. There are usually only a small number of tokens for a programming
language including constants (such as integers, doubles, characters, and strings), operators (arithmetic,
relational, and logical), punctuation marks and reserved keywords.
What is a Lexeme?
A lexeme is an actual string of characters that matches with a pattern and generates a token.
eg- “float”, “abs_zero_Kelvin”, “=”, “-”, “273”, “;” .
Lexemes and Tokens Representation

Lexemes Tokens Lexemes Continued... Tokens Continued...

while WHILE A IDENTIEFIER

( LAPREN = ASSIGNMENT

a IDENTIFIER A IDENTIFIER

>= COMPARISON - ARITHMETIC

b IDENTIFIER 2 INTEGER

) RPAREN ; SEMICOLON

How Lexical Analyzer Works?


Tokens in a programming language can be described using regular expressions. A scanner, or lexical
analyzer, uses a Deterministic Finite Automaton (DFA) to recognize these tokens, as DFAs are designed
to identify regular languages. Each final state of the DFA corresponds to a specific token type, allowing
the scanner to classify the input. The process of creating a DFA from regular expressions can be
automated, making it easier to handle token recognition efficiently.
Read more about Working of Lexical Analyzer in Compiler.
The lexical analyzer identifies the error with the help of the automation machine and the grammar of
the given language on which it is based like C, C++, and gives row number and column number of the
error.
Suppose we pass a statement through lexical analyzer: a = b + c;
It will generate token sequence like this: id=id+id; Where each id refers to it’s variable in the symbol
table referencing all details For example, consider the program
int main()
{
// 2 variables
int a, b;
a = 10;
return 0;
}
All the valid tokens are:
'int' 'main' '(' ')' '{' 'int' 'a' ',' 'b' ';'
'a' '=' '10' ';' 'return' '0' ';' '}'
Above are the valid tokens. You can observe that we have omitted comments. As another example,

consider below printf statement. There are 5 valid


token in this printf statement.
Exercise 1: Count number of tokens:
int main()
{
int a = 10, b = 20;
printf("sum is:%d",a+b);
return 0;
}
Answer: Total number of token: 27.
Exercise 2: Count number of tokens:
int max(int i);
 Lexical analyzer first read int and finds it to be valid and accepts as token.
 max is read by it and found to be a valid function name after reading (
 int is also a token , then again I as another token and finally ;
Answer: Total number of tokens 7: int, max, ( ,int, i, ), ;
Advantages
 Simplifies Parsing: Breaking down the source code into tokens makes it easier for computers
to understand and work with the code. This helps programs like compilers or interpreters to
figure out what the code is supposed to do. It's like breaking down a big puzzle into smaller
pieces, which makes it easier to put together and solve.
 Error Detection: Lexical analysis will detect lexical errors such as misspelled keywords or
undefined symbols early in the compilation process. This helps in improving the overall
efficiency of the compiler or interpreter by identifying errors sooner rather than later.
 Efficiency: Once the source code is converted into tokens, subsequent phases of compilation
or interpretation can operate more efficiently. Parsing and semantic analysis become faster and
more streamlined when working with tokenized input.
Disadvantages
 Limited Context: Lexical analysis operates based on individual tokens and does not consider
the overall context of the code. This can sometimes lead to ambiguity or misinterpretation of
the code's intended meaning especially in languages with complex syntax or semantics.
 Overhead: Although lexical analysis is necessary for the compilation or interpretation process,
it adds an extra layer of overhead. Tokenizing the source code requires additional computational
resources which can impact the overall performance of the compiler or interpreter.
 Debugging Challenges: Lexical errors detected during the analysis phase may not always
provide clear indications of their origins in the original source code. Debugging such errors can
be challenging especially if they result from subtle mistakes in the lexical analysis process.

Recognition of tokens:

What is a Token?
In programming, a token is the smallest unit of meaningful data; it may be an identifier, keyword,
operator, or symbol. A token represents a series or sequence of characters that cannot be decomposed
further. In languages such as C, some examples of tokens would include:
 Keywords : Those reserved words in C like ` int `, ` char `, ` float `, ` const `, ` goto `, etc.
 Identifiers: Names of variables and user-defined functions.
 Operators : ` + `, ` - `, ` * `, ` / `, etc.
 Delimiters /Punctuators: Symbols used such as commas " , " semicolons " ; " braces ` {} `.
By and large, tokens may be divided into three categories:
 Terminal Symbols (TRM) : Keywords and operators.
 Literals (LIT) : Values like numbers and strings.
 Identifiers (IDN) : Names defined by the user.
Let's understand now how to calculate tokens in a source code (C language):
Example 1:
int a = 10; //Input Source code

Tokens
int (keyword), a(identifier), =(operator), 10(constant) and ;(punctuation-semicolon)
Answer - Total number of tokens = 5
Example 2:
int main() {

// printf() sends the string inside quotation to


// the standard output (the display)
printf("Welcome to GeeksforGeeks!");
return 0;
}
Tokens
'int', 'main', '(', ')', '{', 'printf', '(', ' "Welcome to GeeksforGeeks!" ',
')', ';', 'return', '0', ';', '}'
Answer - Total number of tokens = 14
What is a Lexeme?
A lexeme is a sequence of source code that matches one of the predefined patterns and thereby forms a
valid token. For example, in the expression `x + 5`, both `x` and `5` are lexemes that correspond to
certain tokens. These lexemes follow the rules of the language in order for them to be recognized as
valid tokens.
Example:
main is lexeme of type identifier(token)

(,),{,} are lexemes of type punctuation(token)


What is a Pattern?
A pattern is a rule or syntax that designates how tokens are identified in a programming language. In
fact, it is supposed to specify the sequences of characters or symbols that make up valid tokens, and
provide guidelines as to how to identify them correctly to the scanner.

Example of Programming Language (C, C++)


For a keyword to be identified as a valid token, the pattern is the sequence of characters that make the
keyword.
For identifier to be identified as a valid token, the pattern is the predefined rules that it must start with
alphabet, followed by alphabet or a digit.
Difference Between Token, Lexeme, and Pattern

Criteria Token Lexeme Pattern

It is a sequence of characters in
Token is basically a It specifies a set
the source code that are
sequence of characters of rules that a
matched by given predefined
Definition that are treated as a unit scanner follows
language rules for every
as it cannot be further to create a
lexeme to be specified as a
broken down. token.
valid token.

all the reserved The sequence


Interpretation
keywords of that of characters
of type int, goto
language(main, printf, that make the
Keyword
etc.) keyword.

Interpretation it must start


name of a variable, with the
of type main, a
function, etc alphabet,
Identifier
followed by the
Criteria Token Lexeme Pattern

alphabet or a
digit.

Interpretation
all the operators are
of type +, = +, =
considered tokens.
Operator

each kind of punctuation


Interpretation
is considered a token.
of type (, ), {, } (, ), {, }
e.g. semicolon, bracket,
Punctuation
comma, etc.

any string of
Interpretation a grammar rule or characters
"Welcome to GeeksforGeeks!"
of type Literal boolean literal. (except ' ')
between " and "

Output of Lexical Analysis Phase


The output of Lexical Analyzer serves as an input to Syntax Analyzer as a sequence of tokens and not
the series of lexemes because during the syntax analysis phase individual unit is not vital but the
category or class to which this lexeme belongs is considerable.
Example:
z = x + y;
This statement has the below form for syntax analyzer
<id> = <id> + <id>; //<id>- identifier (token)
The Lexical Analyzer not only provides a series of tokens but also creates a Symbol Table that consists
of all the tokens present in the source code except Whitespaces and comments.
Conclusion
Tokens, patterns, and lexemes represent basic elements of any programming language, helping to break
down and start making sense of code. Tokens are the basic units of meaningful things; patterns define
how such units are identified, whereas the lexemes are actual sequences that match patterns. Basically,
understanding these concepts is indispensable in programming and analyzing codes efficiently.

Idea about LEX: A lexical analyzer generator, Error handling.

Lexical Analysis
Lexical Analysis is the first phase of the compiler, it takes the stream of characters as input and converts
that input into tokens also known as tokenization. The token can be classified into various types such
as Identifier, Separator, Keyword, Operator, Constant, and Special Character, and further, these tokens
will get stored in the symbol table. Lexical Analysis has three phases:
1. Tokenization: It is the process of converting a stream of characters into tokens.
2. Error Messages: The stream of characters we are taking as input is also called lexemes. In this
phase we will get error messages related to lexical analysis, It will generate error messages
while scanning the input such as illegal characters, unmatched strings, and exceeding length.
3. Comments Elimination: It will eliminate the spaces, tab, and blank spaces and then generates
the tokens.
Automatic Lexical Generator
The Automatic Lexical Generator is a tool that generates a code so that we can perform lexical analysis
on that to get the output as tokens. This process is used in compiler design and in the field of computer
science.
As we have discussed above the process of lexical analysis is to take the stream of characters and later
convert it into tokens.
The Lexical Generator includes the following steps:
1. In the first step, we give Lex source program as input to the lexical compiler and it will generate
the Lex.yy.c files as output.
2. In the second step, we take Lex.yy.c as input to the C compiler and it will generate the file a.out.
3. And now the output file a.out will take the stream of characters and generates a sequence of
tokens as an output.
Advantages of Automatic Lexical Generators
Lexical Generators help overcome many problems such as:
The main usage of Automatica Lexical Generators is to make Lexical analyzers for any language that
too in an easy and efficient manner.
1. Making a Lexical analyzer for any programming language is require the same level of design,
coding, and practice to generate the output. In that case, Lexical Generator works efficiently.
2. It is very difficult to make Lexical analyzers for every programming language as it is a very
sophisticated process and with the help of Lexical generators we can solve that problem.

Error Handling in Compiler Design


During the process of language translation, the compiler can encounter errors. While the compiler might
not always know the exact cause of the error, it can detect and analyze the visible problems. The main
purpose of error handling is to assist the programmer by pointing out issues in their code. Error handling
doesn't happen often compared to other tasks in the compiler, so the time it takes to fix errors isn't a
major concern.
Error handlers are specialized programs designed to manage errors in applications. The best error
handlers prevent errors if possible, recover from them without closing the application, or shut down the
affected application and save the error details in a log file.
In programming, some errors can be prevented. These errors can occur during the syntax phase (such
as typing mistakes or wrong use of special characters like semicolons) or the logical phase (when the
code doesn’t produce the expected results, also called bugs). Syntax errors are usually caught by
proofreading the code, while logical errors are best handled through thorough debugging.
Runtime errors occur while a program is running, often due to issues like insufficient memory or invalid
input data, such as a memory conflict.
Error handler = Error Detection + Error Report + Error Recovery.
Sources of Error in Error Handling
 An Error is the blank entries in the symbol table. Errors in the program should be detected and
reported by the parser.
 Whenever an error occurs, the parser can handle it and continue to parse the rest of the input.
 Although the parser is mostly responsible for checking for errors, errors may occur at various
stages of the compilation process.
 Error handling is a process that helps identify errors in a program, reports them to the user, and
then applies recovery strategies to manage the errors. During this process, it's important that
the program's processing time is not slowed down too much. One common error source is blank
entries in the symbol table.
There are two main types of errors:
1. Run-Time Errors
These errors happen while the program is running. They usually occur because of issues like incorrect
system settings or invalid input data. Examples of run-time errors include:
 Lack of memory to run the program.
 Memory conflicts with other programs.
 Logical errors, where the program doesn’t produce the expected results. These can be fixed by
carefully debugging the code.
2. Compile-Time Errors
These errors occur before the program starts running, during the compilation process. They stop the
program from compiling successfully.
Examples of compile-time include:
 Syntax errors (like missing semicolons or incorrect statements).
 Missing file references that prevent the program from compiling.

Finding error or reporting an error


Viable-prefix is the property of a parser that allows early detection of syntax errors.
 Goal detection of an error as soon as possible without further consuming unnecessary input
 How: detect an error as soon as the prefix of the input does not match a prefix of any string in
the language.
Example: for(;), this will report an error as for having two semicolons inside braces.
Error Recovery
There are several methods that a compiler uses to recover from errors. These methods help the compiler
continue processing the code instead of stopping immediately.
Common recovery methods include:
 Panic Mode Recovery – Skips erroneous code and resumes from the next valid statement.
 Phase-Level Recovery – Replaces small incorrect code segments with valid ones.
 Error Productions – Recognizes common errors and provides specific suggestions.
 Global Correction – Makes multiple changes to fix errors optimally.
For a detailed explanation, refer to Error detection and Recovery in Compiler
Advantages of Error Handling in Compiler Design
1. Robustness – Ensures the compiler can handle errors smoothly without crashing, allowing it
to continue processing and provide meaningful error messages.
2. Error Detection – Identifies various errors like syntax, semantic, and type errors to prevent
unexpected program behavior.
3. Error Reporting – Provides clear and precise error messages, helping developers quickly
locate and fix issues.
4. Error Recovery – Tries to fix or bypass errors so the compilation process can continue instead
of stopping abruptly.
5. Incremental Compilation – Allows compiling and testing correct sections of code even if
other parts contain errors, useful for large projects.
6. Efficiency – Saves time by reducing debugging effort with accurate error messages and
recovery mechanisms.
7. Language Development – Helps define and enforce language rules, improving reliability and
consistency in programming.
Disadvantages of error handling in compiler design
1. Increased Complexity – Makes the compiler harder to develop, test, and maintain due to the
complexity of handling various errors.
2. Reduced Performance – Error handling can slow down compilation if it is computationally
intensive or requires additional processing.
3. Longer Development Time – Implementing and testing an effective error handling system
takes time, delaying the compiler’s development.
4. Difficulty in Error Detection – Some errors may be masked by the error handling system,
making them harder to detect or debug.

You might also like