Introduction_Compiler course
Introduction_Compiler course
Introduction
Outline
• Scope of the course
• Disciplines involved in it
• Abstract view for a compiler
• Front-end and back-end tasks
• Modules
Course scope
• Aim:
• To learn techniques of a modern compiler
• Main reference:
• Compilers – Principles, Techniques and Tools, Second Edition by Alfred V. Aho,
Ravi Sethi, Jeffery D. Ullman
• Supplementary references:
• Modern compiler construction in Java 2nd edition
• Advanced Compiler Design and Implementation by Muchnick
Subjects
• Lexical analysis (Scanning)
• Syntax Analysis (Parsing)
• Syntax Directed Translation
• Intermediate Code Generation
• Run-time environments
• Code Generation
• Machine Independent Optimization
Compiler learning
• Isn’t it an old discipline?
• Yes, it is a well-established discipline
• Algorithms, methods and techniques are researched and developed in early
stages of computer science growth
• There are many compilers around and many tools to generate them
automatically
• So, why we need to learn it?
• Although you may never write a full compiler
• But the techniques we learn is useful in many tasks like writing an interpreter
for a scripting language, validation checking for forms and so on
Disciplines involved
• Algorithms
• Languages and machines
• Operating systems
• Computer architectures
What is a Compiler?
• Correctness
• Speed of compilation
• Preserve the correct the meaning of the code
• The speed of the target code
• Recognize legal and illegal program constructs
• Good error reporting/handling
• Code debugging help
Types of Compiler
• The multi-pass compiler processes the source code or syntax tree of a program several
times.
• It divided a large program into multiple small programs and process them.
• It develops multiple intermediate codes.
• All of these multi-pass take the output of the previous phase as an input.
• it requires less memory.
• It is also known as 'Wide Compiler'.
Tasks of Compiler
• Compiler construction tools were introduced as computer-related technologies spread all over the
world.
• They are also known as a compiler- compilers, compiler- generators or translator.
• These tools use specific language or algorithm for specifying and implementing the component of the
compiler.
• Following are the example of compiler construction tools.
• Scanner generators:
• This tool takes regular expressions as input.
• For example LEX for Unix Operating System.
• Syntax-directed translation engines:
• These software tools offer an intermediate code by using the parse tree.
• It has a goal of associating one or more translations with each node of the parse tree.
• Parser generators:
• A parser generator takes a grammar as input and automatically generates source code which can parse streams of characters with
the help of a grammar.
• Automatic code generators:
• Takes intermediate code and converts them into Machine Language
• Data-flow engines:
• This tool is helpful for code optimization.
• Here, information is supplied by user and intermediate code is compared to analyze any relation.
• It is also known as data-flow analysis.
• It helps you to find out how values are transmitted from one part of the program to another part.
Why use a Compiler?
• A compiler can broadly be divided into two phases based on the way
they compile.
• Analysis Phase
• Known as the front-end of the compiler, the analysis phase of the compiler
reads the source program, divides it into core parts and then checks for lexical,
grammar and syntax errors.
• The analysis phase generates an intermediate representation of the source
program and symbol table, which should be fed to the Synthesis phase as input.
Cont.
• Synthesis Phase
• Known as the back-end of the compiler, the synthesis phase
generates the target program with the help of intermediate source
code representation and symbol table.
• A compiler can have many phases and passes.
• Pass :
• A pass refers to the traversal of a compiler through the entire program.
• Phase :
• A phase of a compiler is a distinguishable stage, which takes input from
the previous stage, processes and yields output that can be used as input
for the next stage.
• A pass can have more than one phase.
STRUCTURE OF THE COMPILER DESIGN
What are the Phases of Compiler Design?
• Compiler operates in various phases each phase transforms the source program from
one representation to another.
• Every phase takes inputs from its previous stage and feeds its output to the next phase
of the compiler.
• All these phases convert the source code by dividing into tokens, creating parse trees,
and optimizing the source code by different phases.
• There are 6 phases in a compiler.
• Each of this phase help in converting the high-level langue the machine code.
• The phases of a compiler are:
• Lexical analysis
• Syntax analysis
• Semantic analysis
• Intermediate code generator
• Code optimizer
• Code generator
Phase 1: Lexical Analysis
• Lexical Analysis is the first phase when compiler scans the source code.
• This process can be left to right, character by character, and group these
characters into tokens.
• Here, the character stream from the source program is grouped in
meaningful sequences by identifying the tokens.
• It makes the entry of the corresponding tickets into the symbol table and
passes that token to next phase.
• The primary functions of this phase are:
• Identify the lexical units in a source code
• Classify lexical units into classes like constants, reserved words, and enter them
in different tables.
• It will Ignore comments in the source program
• Identify token which is not a part of the language
Phases of Compiler
• Example: x = y + 10
• Tokens
X identifier
Assignment
=
operator
Y identifier
Addition
+
operator
10 Number
Phase 2: Syntax Analysis
• Syntax analysis is all about discovering structure in code.
• It determines whether or not a text follows the expected format.
• The main aim of this phase is to make sure that the source code was
written by the programmer is correct or not.
• Syntax analysis is based on the rules based on the specific programing
language by constructing the parse tree with the help of tokens.
• It also determines the structure of source language and grammar or
syntax of the language.
• Here, is a list of tasks performed in this phase:
• Obtain tokens from the lexical analyzer
• Checks if the expression is syntactically correct or not
• Report all syntax errors
• Construct a hierarchical structure which is known as a parse tree
Cont. …
• Example
• Any identifier/number is an expression
• If x is an identifier and y+10 is an expression, then x= y+10 is a statement.
• Consider parse tree for the following example
• (a+b)*c
In Parse Tree
Interior node: record with an operator filed and two files for
children
Leaf: records with 2/more fields; one for token and other
information about the token
Ensure that the components of the program fit together
meaningfully
Gathers type information and checks for type compatibility
Checks operands are permitted by the source language
Phase 3: Semantic Analysis
• Semantic analysis checks the semantic consistency of the code.
• It uses the syntax tree of the previous phase along with the symbol table to
verify that the given source code is semantically consistent.
• It also checks whether the code is conveying an appropriate meaning.
• Semantic Analyzer will check for Type mismatches, incompatible operands, a
function called with improper arguments, an undeclared variable, etc.
• Functions of Semantic analyses phase are:
• Helps you to store type information gathered and save it in symbol table or syntax tree
• Allows you to perform type checking
• In the case of type mismatch, where there are no exact type correction rules which
satisfy the desired operation a semantic error is shown
• Collects type information and checks for type compatibility
• Checks if the source language permits the operands or not
Cont. …
• Example
• float x = 20.2;
• float y = x*30;
• In the above code, the semantic analyzer will typecast the integer 30 to float
30.0 before multiplication
Phase 4: Intermediate Code Generation
• t1 := int_to_float(5)
• t2 := rate * t1
• t3 := count + t2
• total := t3
Phase 5: Code Optimization
• A symbol table contains a record for each identifier with fields for the
attributes of the identifier.
• This component makes it easier for the compiler to search the
identifier record and retrieve it quickly.
• The symbol table also helps you for the scope management.
• The symbol table and error handler interact with all the phases and
symbol table update correspondingly.
Error Handling Routine
• In the compiler design process error may occur in all the below-given phases:
• Lexical analyzer: Wrongly spelled tokens
• Syntax analyzer: Missing parenthesis
• Intermediate code generator: Mismatched operands for an operator
• Code Optimizer: When the statement is not reachable
• Code Generator: Unreachable statements
• Symbol tables: Error of multiple declared identifiers
• Most common errors are invalid character sequence in scanning, invalid token
sequences in type, scope error, and parsing in semantic analysis.
• The error may be encountered in any of the above phases.
• After finding errors, the phase needs to deal with the errors to continue with the
compilation process.
• These errors need to be reported to the error handler which handles the error to
perform the compilation process.
• Generally, the errors are reported in the form of message.