0% found this document useful (0 votes)
23 views37 pages

Ch1 Introduction

Uploaded by

RAHEL YEKOYE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views37 pages

Ch1 Introduction

Uploaded by

RAHEL YEKOYE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Chapter -1

Introduction to compiler design

Chapter – 1 : Introduction to Compiler Design 1 Bahir Dar Institute of Technology


Contents
▪ The Evolution of Programming Languages
▪ What is compiler?
▪ History of compilers
▪ Programs related to compilers/cousins of compiler
▪ Why Study Compilers?
▪ Analysis of the source program
▪ Phases of Compiler Design
• Scanner  Intermediate Code Generator
• Parser  Code Optimizer

• Semantic Analyzer  Code generator

▪ Symbol Tables and Error Handling


▪ Compiler Construction Tools
Chapter – 1 : Introduction to Compiler Design 2 Bahir Dar Institute of Technology
The Evolution of Programming Languages
▪ 1940’s - the first electronic computers are invented
• They were programmed in machine language by sequences of
0's and 1‘s
• tedious, error prone, machine dependant, hard to understand and
modify but fast to run
▪ Early 1950's - mnemonic assembly languages developed
• First they were just mnemonic representations of machine
instructions, later, macro instructions were added
 latter half of the 1950's - higher-level languages
developed
 Fortran for scientific computation, Cobol for business data processing,
and Lisp for symbolic computation

Chapter – 1 : Introduction to Compiler Design 3 Bahir Dar Institute of Technology


The Evolution of Programming Languages …
• following decades, many more languages were created and
today, there are thousands of programming languages
▪ They can be classified in a variety of ways.
• Classification based on generation
• First-generation languages - machine languages
• Second-generation languages - the assembly languages,
• Third-generation languages - higher-level languages like Fortran,
Cobol, Lisp, C, C++, C#, and Java.
• Fourth-generation languages - languages designed for specific
applications e.g. NOMAD for report generation, SQL for database
queries
• Fifth-generation language - includes logic- and constraint-based
languages like Prolog and OPS5
Chapter – 1 : Introduction to Compiler Design 4 Bahir Dar Institute of Technology
The Evolution of Programming Languages …
▪ Another classification
• Imperative for languages - a program specifies how a computation is to be done. E.g.
C, C++, C#, and Java
• Declarative languages - a program specifies what computation is to be done. E.g.
prolog, ML and Haskel
▪ Within the declarative and imperative families, there are several
important subclasses.
• von Neumann language E.g. C, Fortran
• object-oriented language E.g. C++, Java
• Scripting languages E.g. Ruby, PHP, Perl, Python
• Logic-or constraint-based E.g. Prolog

Chapter – 1 : Introduction to Compiler Design 5 Bahir Dar Institute of Technology


What is compiler?
▪ Computer’s CPU is capable of executing, very simple, primitive
operations (move, add, …)
• Recall this from your study of assembly language or computer organization

▪ Hence, a program for a computer must be built using machine


language
▪ However, this is a tedious and error-prone process
• That is why, high-level programming language are used

▪ Programs written in high-level languages can be very different


from the machine language
• So some means of bridging the gap is required
• This is where the compiler comes in.

Chapter – 1 : Introduction to Compiler Design 6 Bahir Dar Institute of Technology


What is compiler? …
▪ A compiler is
• a program that translates
• a program written in a high-level programming language
(suitable for human programmers) into
• low-level machine language (that is required by computers).
• program that takes a program written in a source language and
translates it into an equivalent program in a target language.

source COMPILER
target
program program
( Normally a program error ( Normally the equivalent
written in a high-level program in machine code
programming language) messages or assembly language)

Chapter – 1 : Introduction to Compiler Design 7 Bahir Dar Institute of Technology


History of compilers
▪ 1940’s:
• Early stored-program computers were programmed in
machine language.
• Later, assembly languages were developed
▪ 1950’s:
• Early high-level languages were developed, FORTRAN
• Compiler-writing was a huge task, took 18 person year for
FORTRAN compilers
▪ 1960’s onwards/Now:
• Intensively studied
• using software tools, can be done in a few months
Chapter – 1 : Introduction to Compiler Design 8 Bahir Dar Institute of Technology
Programs related to compilers (COUSINS OF COMPILER)
 There are other translators/programs that are related to or used
together with compilers and that often come together with
compilers in complete language development environment.
 As a general, Translator is a program that translates one language
to another.
 Types of Translator:
1.Interpreter 2.Compiler 3.Assembler
▪ Interpreters - Directly execute the operations specified in
the source program on inputs supplied by the user

• Do not produce a target program as a translation


Chapter – 1 : Introduction to Compiler Design 9 Bahir Dar Institute of Technology
Programs related to compilers (COUSINS OF COMPILER)
▪ Compilers vs. Interpreters
• The compiler executes the entire program at a time, but the
interpreter executes each and every line individually.
• Languages using Compilers: FORTRAN, COBOL, C, C++, Pascal,
PL/1
• Languages using Interpreters: Lisp, scheme, BASIC, APL, Perl,
Python, Smalltalk
• Pros of Compilers: Fast execution (creates executable file)
• Cons of Compilers: Slow processing, Debugging (Improved the
IDEs), more memory required(due to object codes)
• Pros of Interpreters: Easy debugging, Fast Development, less
memory requirement
• Cons of interpreters: Not for large projects, Slower execution
Chapter – 1 : Introduction to Compiler Design 10 Bahir Dar Institute of Technology
Programs related to compilers (COUSINS OF COMPILER)
Compilers: Translate a source (human-writable) program to an executable (machine-readable)
program
Interpreters: Convert a source program and execute it at the same time.

Ideally:
Source code Compiler Executable
Input data Executable Output data

Source code
Interpreter Output data
Input data

i.e.

Chapter – 1 : Introduction to Compiler Design 11 Bahir Dar Institute of Technology


Programs related to compilers (COUSINS OF COMPILER)
▪ Assemblers - convert a program in assembly language to its
equivalent program in machine language

▪ Linkers - a computer program


that takes one or more object
files generated by compilers or
assemblers and combines them
into a single executable program.

Chapter – 1 : Introduction to Compiler Design 12 Bahir Dar Institute of Technology


Programs related to compilers (COUSINS OF COMPILER)
▪ Loaders- loads an executable into memory and starts it
running.
▪ Editors – programs used to write/edit source codes
▪ Debuggers – programs which are used to determine
execution errors in a compiled errors
▪ Preprocessors –
• A source program may be divided into modules stored in
separate files.
• The task of collecting the source program is sometimes
entrusted to a separate program, called a preprocessor.
• It may also expand shorthands, called macros, into source
language statements.
Chapter – 1 : Introduction to Compiler Design 13 Bahir Dar Institute of Technology
Programs related to compilers (COUSINS OF COMPILER)
▪ A preprocessors produce input to compilers.
▪ They may perform the following functions.
• 1. Macro processing: A preprocessor may allow a user to define
macros that are short hands for longer constructs.
• 2. File inclusion: A preprocessor may include header files into
the program text.
• 3. Rational preprocessor: these preprocessors augment older
languages with more modern flow-of control and data
structuring facilities.
• egg while-statement or if-statement if none exist in the program itself

• 4. Language Extensions: These preprocessor attempts to add


capabilities to the language by certain amounts to build-in
macro
Chapter – 1 : Introduction to Compiler Design 14 Bahir Dar Institute of Technology
Language Processing System
source program
NB: Relocatable means that it can be loaded
starting at any location L in memory; i.e., if L
preprocessor
is added to all addresses in the code, then all
modified source program references will be correct.
compiler
The relocatable machine code file must retain
target assembly program the information in the symbol table for each
data location or instruction label that is
assembler
referred to externally.
relocatable machine code

linker/loader Library
files
target machine code

Chapter – 1 : Introduction to Compiler Design 15 Bahir Dar Institute of Technology


Language Processing System
▪ A source program may be divided into modules stored in separate files.
▪ The task of collecting the source program is sometimes entrusted to a
separate program, called a preprocessor.
▪ The preprocessor may also expand short-hands, called macros, into
source language statements
▪ Large programs are often compiled in pieces, so the relocatable
machine code may have to be linked together with other
relocatable object files and library files into the code that
actually runs on the machine.
▪ The linker resolves external memory addresses, where the
code in one file may refer to a location in another file.
▪ The loader then puts together all of the executable object files
into memory for execution.
Chapter – 1 : Introduction to Compiler Design 16 Bahir Dar Institute of Technology
Compilers Construction related to other Computer
Science topics
▪ Theory - Finite State Automata, Grammars and Parsing
▪ Algorithms - Graph manipulation, dynamic programming
▪ Data structures - Symbol tables, abstract syntax trees
▪ Systems - Allocation and naming, multi-pass systems, compiler
construction
▪ Computer Architecture - Memory hierarchy, instruction selection,
interlocks and latencies, parallelism
▪ Security - Detection of and Protection against vulnerabilities
▪ Software Engineering - Software development environments,
debugging
▪ Artificial Intelligence - Heuristic based search for best
optimizations
Chapter – 1 : Introduction to Compiler Design 17 Bahir Dar Institute of Technology
Analysis of source program
▪ In compiling, analysis consists of three phases:
• Linear analysis, in which the stream of characters making up the
source program is read from left-to-right and grouped into tokens
that are sequences of characters having a collective meaning.
• Hierarchical analysis, in which characters or tokens are grouped
hierarchically into nested collections with collective meaning.
• Semantic analysis, in which certain checks are performed to
ensure that the components of a program fit together
meaningfully.
• See in phase of compiler topic for detail

Chapter – 1 : Introduction to Compiler Design 18 Bahir Dar Institute of Technology


Why Study Compilers?
▪ Compilers enable programming at a high level language
instead of machine instructions.
• Malleability, Portability, Modularity, Programmer
Productivity,
▪ Increases understanding of language semantics
 Seeing the machine code generated for language

constructs helps understand performance issues for


languages
 Teaches good language design

 New devices may need device-specific languages

 New business fields may need domain-specific languages

Chapter – 1 : Introduction to Compiler Design 19 Bahir Dar Institute of Technology


Why Study Compilers?
▪ Become a better programmer
▪ Insight into interaction between languages, compilers, and
hardware
▪ Compiler techniques are everywhere
▪ Parsing (little languages, interpreters, HTML)
▪ Database engines, query languages
▪ Text processing
▪ Fascinating blend of theory and engineering
▪ Direct applications of theory to practice
▪ Parsing, scanning, static analysis
▪ Resource allocation, “optimization”, etc.
▪ You might even write a compiler some day!
Chapter – 1 : Introduction to Compiler Design 20 Bahir Dar Institute of Technology
Grouping of Phases into Passes /Parts of compilation
▪ Compiler is not a single box that maps a source program into a target program.
▪ There are two parts to this mapping: analysis and synthesis
• Analysis (front part) [Lexical, Syntax, and Semantic analysis]
• breaks up the source program into constituent pieces
• Creates an intermediate representation of the source program
• Reports any error detected
• Stores source program info in a data structure called a symbol table
• Machine Independent/Language Dependent. b/c they depend
primarily on the source language
• Synthesis (Back part)[Code Generation + Code Optimization]
• constructs the desired target program from the intermediate
representation and the information in the symbol table.
• Machine Dependent. b/c they depend on the target
machine/Language independent
▪ Compilation process operates as a sequence of phases,
• each of which transforms one representation of the source program to
another.
• NB: Intermediate code generation is between front end and back end
Chapter – 1 : Introduction to Compiler Design 21 Bahir Dar Institute of Technology
The Phases of a Compiler…

Chapter – 1 : Introduction to Compiler Design 22 Bahir Dar Institute of Technology


Lexical Analyzer (Scanner)
▪ Also called the Lexer
▪ How it works:
• Reads characters from the source program.
• Groups the characters into lexemes (sequences of characters
that "go together").
• Each lexeme corresponds to a token;
• i.e. For each lexeme, the lexical analyzer produces as output a
token of the form (token-name, attribute-value)
• the scanner returns the next token (plus maybe some
additional information) to the parser.
• The scanner may also discover lexical errors (e.g., erroneous
characters).
• Start symbol table with new symbols found
Chapter – 1 : Introduction to Compiler Design 23 Bahir Dar Institute of Technology
Lexical Analyzer (Scanner)…
▪ Tokens include e.g.:
• “Reserved words”: do if float while
• Special characters: ( { , + - = ! /
• Names & numbers: myValue, 3.07e02
▪ The definitions of what a lexeme , token or bad
character is depend on the definition of the source
language.
▪ Examples of tools for lexical analysis are
• Lex
• flex

▪ A lexeme is a sequence of characters in the source program that is


matched by the pattern for a token.

Chapter – 1 : Introduction to Compiler Design 24 Bahir Dar Institute of Technology


Lexical Analyzer - Examples
▪ Consider the expression: sum = 3 + 2; in C programming language.
Tokenized in the table: Lexeme Token Token type
sum identifier
= assign Assignment operator
3 number Integer literal
+ addition Addition operator
2 mult Integer literal
; semicolo End of statement
n

Position := _____
_______ __ initial _+ ___
rate_ *__60_;

▪ Example 2: All are lexemes


▪ Blanks, Line breaks, etc. are scanned out
Chapter – 1 : Introduction to Compiler Design 25 Bahir Dar Institute of Technology
Syntax Analyzer (Parser)
▪ Also known as Hierarchical Analysis/ Parsing
▪ Constructs a parse tree from symbols
▪ A pattern-matching problem
• Language grammar defined by set of rules that identify
legal (meaningful) combinations of symbols
• Each application of a rule results in a node in the parse tree
• Parser applies these rules repeatedly to the program until
leaves of parse tree are “atoms”
▪ If no pattern matches, it’s a syntax error

▪ YACC, bison are tools for this

Chapter – 1 : Introduction to Compiler Design 26 Bahir Dar Institute of Technology


Syntax Analyzer - Example
▪ Source code:
position = initial + rate * 60;
▪ Abstract-syntax tree:

• interior nodes of the tree are OPERATORS;


• a node’s children are its OPERANDS;
• each sub-tree forms a logical unit .
• the sub-tree with * at its root shows that * has higher
precedence than +, the operation “rate * 60” must be
performed as a unit, not “initial + rate”.
Chapter – 1 : Introduction to Compiler Design 27 Bahir Dar Institute of Technology
Semantic Analyzer
▪ Checks source program for semantic errors, e.g., type errors
• Annotates and/or changes the abstract syntax tree based on the attribute
grammar
• Annotate a node that represents an expression with its type.
• Example with before and after:

▪ The most Important activity in This Phase:


• Type Checking - the compiler checks that each operator has operands
that are permitted by the source language specification.
Chapter – 1 : Introduction to Compiler Design 28 Bahir Dar Institute of Technology
Intermediate Code Generator
▪ Translates from abstract-syntax tree to intermediate code
▪ In other words, it gets input from the semantic analysis and converts the
input into output as intermediate code such as:
• 3-address code
• Each statement contains
– at most 3 operands; in addition to “:=”
• An "easy” and “universal” format that can be translated into most assembly
languages.
• Here's an example of 3-address code for the abstract-syntax tree shown on
the preceding slide.
– t1 = inttofloat(60)
– t2 = id3 * t1
– t3 = id2 + t2
– id1 = t3
▪ NB: The three-address code consists of a sequence of instructions, each of which
has at most three operands.
Chapter – 1 : Introduction to Compiler Design 29 Bahir Dar Institute of Technology
Code Optimization
▪ Improve the efficiency of intermediate code.
• Goal may be to make code run faster , and/or to use least
number of registers
t1= intofloat(60)
t2=id3*60.0
t2=id3*t1
id1 = id2 + t2
t3=id2+t2
id1=t3

▪ Current trends:
• to obtain smaller, but maybe slower, equivalent code for
embedded systems;
• to reduce power consumption
• to enable parallelism
Chapter – 1 : Introduction to Compiler Design 30 Bahir Dar Institute of Technology
Code Generation
▪ A compiler may generate
• pure machine codes (machine dependent assembly
language) directly, which is rare now ;
• virtual machine code.
▪ Generates object code from (optimized) intermediate
code LDF R2, id3
MULF R2, R2, #60.0
t2=id3*60.0
LDF R1, id2
id1 = id2 + t2
ADDF R1, R1, R2
STF id1, R1

Chapter – 1 : Introduction to Compiler Design 31 Bahir Dar Institute of Technology


Phases of Compilers (Summary)

Chapter – 1 : Introduction to Compiler Design 32 Bahir Dar Institute of Technology


Symbol Table
▪ Symbol table management is a part of the compiler that
interacts with several of the phases
– Identifiers and their values are found in lexical analysis and placed
in the symbol table
– During syntactical and semantic analysis, type and scope
information is added
– During code generation, type information is used to determine
what instructions to use
– During optimization, the “live analysis” may be kept in the symbol
table
▪ Most suitably implemented as a dynamic data structure
(linear list, binary tree, hash table)
Chapter – 1 : Introduction to Compiler Design 33 Bahir Dar Institute of Technology
Handling Errors
▪ Error handling and reporting also occurs across many
phases
– Lexical analyzer reports invalid character sequences
– Syntactic analyzer reports invalid token sequences
– Semantic analyzer reports type and scope errors, and the like

▪ The compiler may be able to continue with some


errors, but other errors may stop the process

Chapter – 1 : Introduction to Compiler Design 34 Bahir Dar Institute of Technology


Compiler Construction Tools
 Scanner Generators : Produce Lexical Analyzers
 egg. Lex (Flex)
 Parser Generators : Produce Syntax Analyzers
 Example-YACC (Yet Another Compiler-Compiler).
 Syntax-directed Translation Engines : Generate intermediate
Code egg.YACC (Bison)

 Automatic Code Generators : Generate Actual Code


 i.e. It takes a collection of rules to translate intermediate language into
machine language.
 Data-Flow Engines : Support Optimization
 Means: It does code optimization using data-flow analysis, that is, the
gathering of information about how values are transmitted from one
part of a program to each other part.
Chapter – 1 : Introduction to Compiler Design 35 Bahir Dar Institute of Technology
Types of compiler
❑ One pass Compiler
❑ The compiler which completes whole compilation

process in a single pass.


❑ i.e., it traverse through the whole source code only
once.
❑ Threaded Code Compiler
❑ The compiler which will simply replace a string

(e.g., name of subroutine) by an appropriate binary


code.
❑ Incremental Compiler
❑ The compiler which compiles only the changed lines
from the source code and update the object code
accordingly.

Chapter – 1 : Introduction to Compiler Design 36 Bahir Dar Institute of Technology


Types of compiler
❑ Stage Compiler
❑ A compiler which converts the code into assembly

code only.
❑ Just-in-time Compiler
❑ A compiler which converts the code into machine
code after the program starts execution.
❑ Retargetable Compiler
❑ A compiler that can be easily modified to compile a
source code for different CPU architectures.
❑ Parallelizing Compiler
❑ A Compiler capable of compiling a code in parallel
computer architecture.

Chapter – 1 : Introduction to Compiler Design 37 Bahir Dar Institute of Technology

You might also like