Chapter 1
Chapter 1
Chapter 1
1
Outline
Introduction
Programs related to compiler
The translation process
Analysis
• Lexical analysis
• Syntax analysis
• Semantic analysis
Synthesis
• IC generator
• IC optimizer
• Code generator
• Code optimizer
• Major data and structures in a compiler
• Compiler construction tools
2
Introduction
What is a compiler?
a program that reads a program written in one language
(the source language) and translates it into an equivalent
program in another language (the target language).
Why we design compiler?
Why we study compiler construction techniques?
• Compilers provide an essential interface between
applications and architectures
• Compilers embody a wide range of theoretical techniques
Source
program
(High level
Compiler
Error Target program
language)
messages (Assembly or machine
language)
Input Target program
Output
exe
3
Introduction…
Since different platforms, or hardware architectures
along with the operating systems (Windows, Macs,
Unix), require different machine code, you must
compile most programs separately for each platform.
program
compiler compiler
compiler
Unix
Win
Mac
4
Programs related to compilers
Interpreter
Is a program that reads a source program and executes it
Works by analyzing and executing the source program
commands one at a time
Does not translate the whole source program into object
code
Interpretation is important when:
Programmer is working in interactive mode and needs to view
and update variables
Running speed is not important
Commands have simple formats, and thus can be quickly
analyzed and executed
Modification or addition to user programs is required as
execution proceeds
5
Programs related to compilers
Interpreter and compiler
Compilation Processing
Interpreter
Source code Intermediate
code
Compilation Interpretation
6
Programs related to compilers
Interpreter and compiler differences
Interpreter takes one While compiler translates the
statement then translates it entire program in one go and
and executes it and then takes then executes it.
another statement. Compiler generates the error
Interpreter will stop the report after the translation
translation after it gets the of the entire program.
first error. Compiler takes a large amount
Interpreter takes less time to of time in analyzing and
analyze the source code. processing the high level
Over all execution speed is language code.
less. Overall execution time is
faster.
7
Programs related to compilers…
Interpreter…
8
E.g., Compiling Java Programs
The Java compiler produces bytecode not machine code
Bytecode is converted into machine code using a Java
Interpreter
You can run bytecode on any computer that has a Java
Interpreter installed
Win
ter
e
r pr
te
Java Program Java bytecode In
Mac
compiler Interpreter
Inte
rpre
ter
Unix
9
Programs related to compiler…
Assemblers
Translator for the assembly language.
Assembly code is translated into machine code
Output is relocatable machine code.
Linker
Links object files separately compiled or
assembled
Links object files to standard library functions
Generates a file that can be loaded and
executed
10
Programs related to compiler…
Loader
Loading of the executable codes, which are the
outputs of linker, into main memory.
Pre-processors
A pre-processor is a separate program that is called
by the compiler before actual translation begins.
Such a pre-processor:
• Produce input to a compiler
• can delete comments,
• Macro processing (substitutions)
• include other files...
11
Programs related to compiler
C or C++ program
Preprocessor
Assembly code
Assembler
Relocatable object
module
Other relocatable Linker
object modules or
library modules Executable code
Loader
Absolute machine code
12
The translation process
A compiler consists of internally of a number of steps,
or phases, that perform distinct logical operations.
The phases of a compiler are shown in the next slide,
together with three auxiliary components that
interact with some or all of the phases:
The symbol table,
the literal table,
and error handler.
13
The translation process…
Source code
Intermediate code
Literal Scanner generator
table
Intermediate
Tokens code
Symbol Intermediate code
table Parser optimizer
Intermediate
Syntax tree code
Error Target code
handler generator
Semantic
analyzer Target
code
Target code
Annotated optimizer
tree
Target
code
14
Analysis and Synthesis
Analysis (Front end)
Breaks up the source program into constituent pieces and
Creates an intermediate representation of the source
program.
During analysis, the operations implied by the source
program are determined and recorded in hierarchical
structure called a tree.
Synthesis (Back end)
The synthesis part constructs the desired program from
the intermediate representation.
15
Analysis of the source program
16
1. Lexical Analysis or Scanning
The stream of characters making up the source program is
read from left to right and is grouped into tokens.
A token is a sequence of characters having a collective
meaning.
A lexical analyzer, also called a lexer or a scanner,
receives a stream of characters from the source program
and groups them into tokens.
Examples: Source Lexical Streams of
program analyzer tokens
• Identifiers
• Keywords
• Symbols (+, -, …)
• Numbers …
Blanks, new lines, tabulation marks will be removed during
lexical analysis.
17
Lexical analysis or Scanning…
Example
a[index] = 4 + 2;
a identifier
[ left bracket
index identifier
] right bracket
= assignment operator Tokens
4 number
+ plus operator
2 number
; semicolon
A scanner may perform other operations along with the
recognition of tokens.
• It may inter identifiers into the symbol table, and
• It may inter literals into literal table.
18
2. Syntax Analysis or Parsing
19
Syntax analysis or Parsing…
Context-free syntax is specified with a
grammar
20
Syntax analysis or Parsing…
Context-free syntax can be put to better use
1. Goal Expr
2. Expr Expr Op Term
3. | Term S = Goal
4. Term number T = { number, id, +, - }
5. | id
N = { Goal, Expr, Term, Op }
6. Op +
P = { 1, 2, 3, 4, 5, 6, 7 }
7. | -
1. Goal Expr
Production Result
Goal 2. Expr Expr Op Term
1 Expr 3. | Term
2 Expr Op Term 4. Term number
5 Expr Op y 5. | id
7 Expr - y 6. Op +
2 Expr Op term - y 7. | -
4 Expr Op 2 - y
6 Expr + 2 - y A derivation
3 Term + 2 - y
5 x + 2 - y
To recognize a valid sentence in some CFG, we reverse this process and
build up a parse
22
Syntax analysis or Parsing…
A parse can be represented by a tree (parse
tree or syntax tree) Goal
x + 2 - y Expr
Expr Op Term
-
The AST summarizes
grammatical structure,
without including detail
+ <id,y> about the derivation
<id,x> <number,2>
26
Intermediate Code Generator/Representation
Comes after syntax and semantic analysis
Separates the compiler front end from its backend
Intermediate representation should have 2 important
properties:
Should be easy to produce
Should be easy to translate into the target program
Intermediate representation can have a variety of forms:
Three-address code, AST, or DAG representation
Intermediate code
Abstract syntax Intermediate code
generator
28
Code generator…
The code generator takes the IR code and generates code for
the target machine.
load @b r1
abxc+d load @c r2
becomes computes
ef+bxc+d mult r1,r2 r3
bxc+d
load @d r4
add r3,r4 r5
store r5 @a reuses
load @f r6 bxc+d
add r5,r6 r7
store r7 @e
29
The Optimizer (or Middle End)
Traditional Three-part Compiler
Source Front IR Optimizer IR Back Machine
Code End (Middle End) End code
Errors
30
The Optimizer (or Middle End)
IR Opt IR Opt IR Opt IR
... Opt IR
1 2 3 n
Errors
Modern optimizers are structured as a series of passes
Typical Transformations
Discover & propagate some constant value
Move a computation to a less frequently executed place
Specialize some computation based on context
Discover a redundant computation & remove it
Remove useless or unreachable code
Encode an idiom in some particularly efficient form
31
Major Data and Structures in a Compiler…
Symbol Table
Keeps information associated with all kinds of tokens:
32
Major Data and Structures in a Compiler…
Literal Table
Stores constant values and string literals in a
program.
One literal table applies globally to the entire
program.
Used by the code generator to:
• Assign addresses for literals.
Avoids the replication of constants and strings.
Quick insertion and lookup are essential.
33
Compiler construction tools
Various tools are used in the construction of the
various parts of a compiler.
Scanner generators
Ex. Lex, flex, JLex
These tools generate a scanner /lexical
analyzer/ if given a regular expression.
Parser Generators
Ex. Yacc, Bison
These tools produce a parser /syntax analyzer/
if given a Context Free Grammar (CFG) that
describes the syntax of the source language.
34
Compiler construction tools…
Syntax directed translation engines
Ex. Cornell Synthesizer Generator
It produces a collection of routines that walk
the parse tree and execute some tasks.
Automatic code generators
Take a collection of rules that define the
translation of the IC to target code and
produce a code generator.
This completes our brief description of the
phases of compiler.
35