0% found this document useful (0 votes)
5 views

Introduction_Compiler course

The compiler course aims to teach modern compiler techniques, covering topics such as lexical analysis, syntax analysis, and code generation, with references including 'Compilers – Principles, Techniques and Tools'. It involves understanding the roles of various components like preprocessors, assemblers, linkers, and loaders, as well as the differences between compilers and interpreters. The course also addresses the historical development of compilers and their applications in programming language implementation and optimization.

Uploaded by

yabera528
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Introduction_Compiler course

The compiler course aims to teach modern compiler techniques, covering topics such as lexical analysis, syntax analysis, and code generation, with references including 'Compilers – Principles, Techniques and Tools'. It involves understanding the roles of various components like preprocessors, assemblers, linkers, and loaders, as well as the differences between compilers and interpreters. The course also addresses the historical development of compilers and their applications in programming language implementation and optimization.

Uploaded by

yabera528
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

Compiler course

Introduction
Outline
• Scope of the course
• Disciplines involved in it
• Abstract view for a compiler
• Front-end and back-end tasks
• Modules
Course scope
• Aim:
• To learn techniques of a modern compiler
• Main reference:
• Compilers – Principles, Techniques and Tools, Second Edition by Alfred V. Aho,
Ravi Sethi, Jeffery D. Ullman
• Supplementary references:
• Modern compiler construction in Java 2nd edition
• Advanced Compiler Design and Implementation by Muchnick
Subjects
• Lexical analysis (Scanning)
• Syntax Analysis (Parsing)
• Syntax Directed Translation
• Intermediate Code Generation
• Run-time environments
• Code Generation
• Machine Independent Optimization
Compiler learning
• Isn’t it an old discipline?
• Yes, it is a well-established discipline
• Algorithms, methods and techniques are researched and developed in early
stages of computer science growth
• There are many compilers around and many tools to generate them
automatically
• So, why we need to learn it?
• Although you may never write a full compiler
• But the techniques we learn is useful in many tasks like writing an interpreter
for a scripting language, validation checking for forms and so on
Disciplines involved
• Algorithms
• Languages and machines
• Operating systems
• Computer architectures
What is a Compiler?

• Compiler is a translator program that translates a program written in (HLL) the


source program(source code ) and translate it into an equivalent program in (MLL)
the target program.
• It translates the code written in one programming language to some other language without
changing the meaning of the code.
• Compiler a program that translates an executable program in one language into an executable
program in another language.
• As an important part of a compiler is error showing to the programmer.
• Executing a program written in HLL programming language is basically of two parts.
• The source program must first be compiled translated into a object program.
• Then the results object program is loaded into a memory executed.
• The compiler also makes the end code efficient which is optimized for execution
time and memory space.
• we expect the program produced by the compiler to be better, in some way, than the original
• The compiling process includes basic translation mechanisms and error detection.
• Compiler process goes through lexical, syntax, and semantic analysis at the front
end, and code generation and optimization at a back-end.
Cont.…
Features of Compilers

• Correctness
• Speed of compilation
• Preserve the correct the meaning of the code
• The speed of the target code
• Recognize legal and illegal program constructs
• Good error reporting/handling
• Code debugging help
Types of Compiler

Following are the different types of Compiler:


• Single Pass Compilers
• Two Pass Compilers
• Multi-pass Compilers
Single Pass Compiler

In single pass Compiler source code directly transforms into


machine code. For example, Pascal language.
Two Pass Compiler

• Two pass Compiler is divided into two sections, viz.


• Front end: It maps legal code into Intermediate Representation (IR).
• Back end: It maps IR onto the target machine
• The Two pass compiler method also simplifies the retargeting process.
It also allows multiple front ends.
Multi-pass Compilers

• The multi-pass compiler processes the source code or syntax tree of a program several
times.
• It divided a large program into multiple small programs and process them.
• It develops multiple intermediate codes.

• All of these multi-pass take the output of the previous phase as an input.
• it requires less memory.
• It is also known as 'Wide Compiler'.
Tasks of Compiler

• Main tasks performed by the Compiler are:


• Breaks up the source program into pieces and impose grammatical structure
on them
• Allows you to construct the desired target program from the intermediate
representation and also create the symbol table
• Compiles source code and detects errors in it
• Manage storage of all variables and codes.
• Support for separate compilation
• Read, analyze the entire program, and translate to semantically equivalent
• Translating the source code into object code depending upon the type of
machine
History of Compiler

• Important Landmark of Compiler's history are as follows:


• The "compiler" word was first used in the early 1950s by Grace
Murray Hopper
• The first compiler was build by John Backum and his group between
1954 and 1957 at IBM
• COBOL was the first programming language which was compiled on
multiple platforms in 1960
• The study of the scanning and parsing issues were pursued in the
1960s and 1970s to provide a complete solution
Steps for Language processing systems

• Before knowing about the concept of compilers, you first need to


understand a few other tools which work with compilers.
Cont.
• High Level Language
• If a program contains #define or #include directives such as #include or #define it is called HLL.
• They are closer to humans but far from machines.
• These (#) tags are called pre-processor directives.
• They direct the pre-processor about what to do.
• Preprocessor:
• The preprocessor is considered as a part of the Compiler and produce input to compilers.
• It is a tool which produces input for Compiler.
• The pre-processor removes all the #include directives by including the files called file inclusion and all
the #define directives using macro expansion.
• It performs file inclusion, augmentation, macro-processing and language extension etc.
• It deals with macro processing, augmentation, language extension, etc.
• They may perform the following functions.
• Macro processing: A preprocessor may allow a user to define macros that are short hands for longer
constructs.
• File inclusion: A preprocessor may include header files into the program text.
• Rational preprocessor: these preprocessors augment older languages with more modern flow-of- control and
data structuring facilities.
• Language Extensions: These preprocessor attempts to add capabilities to the language by certain amounts to
build-in macro
Cont..
• Assembly Language
• Programmers found it difficult to write or read programs in machine language.
• They begin to use a mnemonic (symbols) for each machine instruction, which they would subsequently
translate into machine language.
• Such a mnemonic machine language is now called an assembly language.
• Its neither in binary form nor high level.
• It is an intermediate state that is a combination of machine instructions and some other useful data
needed for execution.
• Programs known as assembler were written to automate the translation of assembly language in to
machine language.
• The input to an assembler program is called source program, the output is a machine language
translation (object program).
• Assembler
• For every platform (Hardware + OS) we will have a assembler.
• They are not universal since for each platform we have one.
• Its translates assembly language programs to machine code which is machine understandable language.
• The output of an assembler is called an object file, which contains a combination of machine
instructions as well as the data required to store these instructions in memory.
Cont..
• Linker:
• Linker is a computer program that links and merges various object files together in
order to make(create) an executable file.
• All these files might have been compiled with separate assemblers.
• The major task of a linker is to search and locate referenced module/routines in a
program and to determine the memory location where these codes will be stored or
loaded, making the program instruction to have absolute references.
• It converts the relocatable code into absolute code and tries to run the program
resulting in a running program or an error message (or sometimes both can happen).
• Loader:
• The loader is a part of the of operating system and is responsible for loading executable
files into memory and execute (run) them.
• It calculates the size of a program (instructions and data) and creates memory space for
it.
• It initializes various registers to initiate execution
Cont..
• Relocatable Machine Code
• Is software whose execution address can be changed.
• It can be loaded at any point and can be run.
• The address within the program will be in such a way that it will cooperate for the program
movement.
• A relocatable program might run at address 0 in one instance, and at 10000 in another.
• The word "relocatable" is applicable, since each is assembled at a pseudo-address of 0.
• The linker corrects all address references to the proper execution values.
• Absolute machine code
• Absolute uses absolute addresses, jump to this exact address, read from this exact address. If
equal then branch to 0x1000.
• Cross-compiler:
• A Cross compiler in compiler design is a platform which helps you to generate executable code.
• Source-to-source Compiler:
• Source to source compiler is a term used when the source code of one programming language is
translated into the source of another language.
Cont..
• Interpreter:
• a program that reads an executable program and produces the results of running that program
• usually, this involves executing the source program in some fashion
• An interpreter is like Compiler which translates high-level language into low-level machine
language ,But they are different in the way they read the input.
• The main difference between both is that interpreter reads and transforms code line by line.
• The difference lies in the way they read the source code or input.
• whereas an interpreter translates the program one statement at a time.
• Interpreted programs are usually slower with respect to compiled ones.
• An interpreter reads a statement from the input, converts it to an intermediate code, executes it, then takes
the next statement in sequence.
• If an error occurs, an interpreter stops execution and reports it.
• Compiler reads the entire code at once and creates the machine code.
• The Compiler in one go reads the inputs, does the processing and executes the source code.
• Compiler scans the entire program and translates it as a whole into machine code
• A compiler reads the whole source code at once, creates tokens, checks semantics, generates intermediate
code, executes the whole program and may involve many passes.
• When error occur in a compiler reads the whole program even if it encounters several errors.
• Our course is mainly about compilers but many of the same issues arise in interpreters
Interpreter:

• Languages such as BASIC, SNOBOL, LISP can be translated using interpreters.


• JAVA also uses interpreter.
• The process of interpretation can be carried out in following phases.
• 1. Lexical analysis
• 2. Syntax analysis
• 3. Semantic analysis
• 4. Direct Execution
• Advantages:
• Modification of user program can be easily made and implemented as execution proceeds.
• Type of object that denotes a various may change dynamically.
• Debugging a program and finding errors is simplified task for a program used for interpretation.
• The interpreter for the language makes it machine independent.
• Disadvantages:
• The execution of the program is slower.
• Memory consumption is more
TRANSLATOR

• A translator is a program that takes as input a program written in one


language and produces as output a program in another language.
• Beside program translation, the translator performs another very
important role, the error-detection.
• Any violation of d HLL specification would be detected and reported
to the programmers. Important role of translator are:
• Translating the HLL program input into an equivalent ml program.
• Providing diagnostic messages wherever the programmer violates
specification of the HLL.
LIST OF COMPILERS
1. Ada compilers
2. ALGOL compilers
3. BASIC compilers
4. C# compilers
5. C compilers
6. C++ compilers
7. COBOL compilers
8. Common Lisp compilers
9. ECMAScript interpreters
10. Fortran compilers
11. Java compilers
12. Pascal compilers
13. PL/I compilers
14. Python compilers
15. Smalltalk compilers
Compiler Construction Tools

• Compiler construction tools were introduced as computer-related technologies spread all over the
world.
• They are also known as a compiler- compilers, compiler- generators or translator.
• These tools use specific language or algorithm for specifying and implementing the component of the
compiler.
• Following are the example of compiler construction tools.
• Scanner generators:
• This tool takes regular expressions as input.
• For example LEX for Unix Operating System.
• Syntax-directed translation engines:
• These software tools offer an intermediate code by using the parse tree.
• It has a goal of associating one or more translations with each node of the parse tree.
• Parser generators:
• A parser generator takes a grammar as input and automatically generates source code which can parse streams of characters with
the help of a grammar.
• Automatic code generators:
• Takes intermediate code and converts them into Machine Language
• Data-flow engines:
• This tool is helpful for code optimization.
• Here, information is supplied by user and intermediate code is compared to analyze any relation.
• It is also known as data-flow analysis.
• It helps you to find out how values are transmitted from one part of the program to another part.
Why use a Compiler?

• Compiler verifies entire program, so there are no syntax or semantic errors


• The executable file is optimized by the compiler, so it is executes faster
• Allows you to create internal structure in memory
• There is no need to execute the program on the same machine it was built
• Translate entire program in other language
• Generate files on disk
• Link the files into an executable format
• Check for syntax errors and data types
• Helps you to enhance your understanding of language semantics
• Helps to handle language performance issues
• Opportunity for a non-trivial programming project
• The techniques used for constructing a compiler can be useful for other purposes as
well
Application of Compilers

• Compiler design helps full implementation Of High-Level


Programming Languages
• Support optimization for Computer Architecture Parallelism
• Design of New Memory Hierarchies of Machines
• Widely used for Translating Programs
• Used with other Software Productivity Tools
Compiler Design – Architecture

• A compiler can broadly be divided into two phases based on the way
they compile.
• Analysis Phase
• Known as the front-end of the compiler, the analysis phase of the compiler
reads the source program, divides it into core parts and then checks for lexical,
grammar and syntax errors.
• The analysis phase generates an intermediate representation of the source
program and symbol table, which should be fed to the Synthesis phase as input.
Cont.
• Synthesis Phase
• Known as the back-end of the compiler, the synthesis phase
generates the target program with the help of intermediate source
code representation and symbol table.
• A compiler can have many phases and passes.
• Pass :
• A pass refers to the traversal of a compiler through the entire program.
• Phase :
• A phase of a compiler is a distinguishable stage, which takes input from
the previous stage, processes and yields output that can be used as input
for the next stage.
• A pass can have more than one phase.
STRUCTURE OF THE COMPILER DESIGN
What are the Phases of Compiler Design?
• Compiler operates in various phases each phase transforms the source program from
one representation to another.
• Every phase takes inputs from its previous stage and feeds its output to the next phase
of the compiler.
• All these phases convert the source code by dividing into tokens, creating parse trees,
and optimizing the source code by different phases.
• There are 6 phases in a compiler.
• Each of this phase help in converting the high-level langue the machine code.
• The phases of a compiler are:
• Lexical analysis
• Syntax analysis
• Semantic analysis
• Intermediate code generator
• Code optimizer
• Code generator
Phase 1: Lexical Analysis
• Lexical Analysis is the first phase when compiler scans the source code.
• This process can be left to right, character by character, and group these
characters into tokens.
• Here, the character stream from the source program is grouped in
meaningful sequences by identifying the tokens.
• It makes the entry of the corresponding tickets into the symbol table and
passes that token to next phase.
• The primary functions of this phase are:
• Identify the lexical units in a source code
• Classify lexical units into classes like constants, reserved words, and enter them
in different tables.
• It will Ignore comments in the source program
• Identify token which is not a part of the language
Phases of Compiler
• Example: x = y + 10
• Tokens
X identifier
Assignment
=
operator
Y identifier
Addition
+
operator
10 Number
Phase 2: Syntax Analysis
• Syntax analysis is all about discovering structure in code.
• It determines whether or not a text follows the expected format.
• The main aim of this phase is to make sure that the source code was
written by the programmer is correct or not.
• Syntax analysis is based on the rules based on the specific programing
language by constructing the parse tree with the help of tokens.
• It also determines the structure of source language and grammar or
syntax of the language.
• Here, is a list of tasks performed in this phase:
• Obtain tokens from the lexical analyzer
• Checks if the expression is syntactically correct or not
• Report all syntax errors
• Construct a hierarchical structure which is known as a parse tree
Cont. …
• Example
• Any identifier/number is an expression
• If x is an identifier and y+10 is an expression, then x= y+10 is a statement.
• Consider parse tree for the following example
• (a+b)*c
In Parse Tree
 Interior node: record with an operator filed and two files for
children
 Leaf: records with 2/more fields; one for token and other
information about the token
 Ensure that the components of the program fit together
meaningfully
 Gathers type information and checks for type compatibility
 Checks operands are permitted by the source language
Phase 3: Semantic Analysis
• Semantic analysis checks the semantic consistency of the code.
• It uses the syntax tree of the previous phase along with the symbol table to
verify that the given source code is semantically consistent.
• It also checks whether the code is conveying an appropriate meaning.
• Semantic Analyzer will check for Type mismatches, incompatible operands, a
function called with improper arguments, an undeclared variable, etc.
• Functions of Semantic analyses phase are:
• Helps you to store type information gathered and save it in symbol table or syntax tree
• Allows you to perform type checking
• In the case of type mismatch, where there are no exact type correction rules which
satisfy the desired operation a semantic error is shown
• Collects type information and checks for type compatibility
• Checks if the source language permits the operands or not
Cont. …
• Example
• float x = 20.2;
• float y = x*30;
• In the above code, the semantic analyzer will typecast the integer 30 to float
30.0 before multiplication
Phase 4: Intermediate Code Generation

• Once the semantic analysis phase is over the compiler, generates


intermediate code for the target machine.
• It represents a program for some abstract machine.
• Intermediate code is between the high-level and machine level language.
• This intermediate code needs to be generated in such a manner that
makes it easy to translate it into the target machine code.
• Functions on Intermediate Code generation:
• It should be generated from the semantic representation of the source program
• Holds the values computed during the process of translation
• Helps you to translate the intermediate code into target language
• Allows you to maintain precedence ordering of the source language
• It holds the correct number of operands of the instruction
Cont. ..
• Example
• total = count + rate * 5
• Intermediate code with the help of address code method is:

• t1 := int_to_float(5)
• t2 := rate * t1
• t3 := count + t2
• total := t3
Phase 5: Code Optimization

• The next phase of is code optimization or Intermediate code.


• This phase removes unnecessary code line and arranges the sequence
of statements to speed up the execution of the program without
wasting resources.
• The main goal of this phase is to improve on the intermediate code to
generate a code that runs faster and occupies less space.
• The primary functions of this phase are:
• It helps you to establish a trade-off between execution and compilation speed
• Improves the running time of the target program
• Generates streamlined code still in intermediate representation
• Removing unreachable code and getting rid of unused variables
• Removing statements which are not altered from the loop
Cont..
• Example:
• Consider the following code
• a = intofloat(10)
• b=c*a
• d=e+b
• f=d
• Can become
• b =c * 10.0
• f = e+b
Phase 6: Code Generation

• Code generation is the last and final phase of a compiler.


• It gets inputs from code optimization phases and produces the page code or
object code as a result.
• The objective of this phase is to allocate storage and generate relocatable
machine code.
• It also allocates memory locations for the variable.
• The instructions in the intermediate code are converted into machine
instructions.
• This phase coverts the optimize or intermediate code into the target language.
• The target language is the machine code.
• Therefore, all the memory locations and registers are also selected and allotted
during this phase.
• The code generated by this phase is executed to take inputs and generate
expected outputs.
Cont…
• Example:
• a = b + 60.0
• Would be possibly translated to registers.
• MOVF a, R1
• MULF #60.0, R2
• ADDF R1, R2
Symbol Table Management

• A symbol table contains a record for each identifier with fields for the
attributes of the identifier.
• This component makes it easier for the compiler to search the
identifier record and retrieve it quickly.
• The symbol table also helps you for the scope management.
• The symbol table and error handler interact with all the phases and
symbol table update correspondingly.
Error Handling Routine

• In the compiler design process error may occur in all the below-given phases:
• Lexical analyzer: Wrongly spelled tokens
• Syntax analyzer: Missing parenthesis
• Intermediate code generator: Mismatched operands for an operator
• Code Optimizer: When the statement is not reachable
• Code Generator: Unreachable statements
• Symbol tables: Error of multiple declared identifiers
• Most common errors are invalid character sequence in scanning, invalid token
sequences in type, scope error, and parsing in semantic analysis.
• The error may be encountered in any of the above phases.
• After finding errors, the phase needs to deal with the errors to continue with the
compilation process.
• These errors need to be reported to the error handler which handles the error to
perform the compilation process.
• Generally, the errors are reported in the form of message.

You might also like