0% found this document useful (0 votes)
20 views

Chapter 1

Uploaded by

Ermias Mesfin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Chapter 1

Uploaded by

Ermias Mesfin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 35

Principles of Compiler Design

Chapter 1

1
Outline
 Introduction
 Programs related to compiler
 The translation process
 Analysis

• Lexical analysis
• Syntax analysis
• Semantic analysis
 Synthesis
• IC generator
• IC optimizer
• Code generator
• Code optimizer
• Major data and structures in a compiler
• Compiler construction tools

2
Introduction
What is a compiler?
 a program that reads a program written in one language
(the source language) and translates it into an equivalent
program in another language (the target language).
 Why we design compiler?
 Why we study compiler construction techniques?
• Compilers provide an essential interface between
applications and architectures
• Compilers embody a wide range of theoretical techniques
Source
program
(High level
Compiler
Error Target program
language)
messages (Assembly or machine
language)
Input Target program
Output
exe
3
Introduction…
 Since different platforms, or hardware architectures
along with the operating systems (Windows, Macs,
Unix), require different machine code, you must
compile most programs separately for each platform.
program

compiler compiler

compiler

Unix
Win
Mac

4
Programs related to compilers
 Interpreter
 Is a program that reads a source program and executes it
 Works by analyzing and executing the source program
commands one at a time
 Does not translate the whole source program into object
code
 Interpretation is important when:
 Programmer is working in interactive mode and needs to view
and update variables
 Running speed is not important
 Commands have simple formats, and thus can be quickly
analyzed and executed
 Modification or addition to user programs is required as
execution proceeds
5
Programs related to compilers
 Interpreter and compiler

Source code Exe code Machine

Compilation Processing

Interpreter
Source code Intermediate
code

Compilation Interpretation

6
Programs related to compilers
Interpreter and compiler differences
 Interpreter takes one  While compiler translates the
statement then translates it entire program in one go and
and executes it and then takes then executes it.
another statement.  Compiler generates the error
 Interpreter will stop the report after the translation
translation after it gets the of the entire program.
first error.  Compiler takes a large amount
 Interpreter takes less time to of time in analyzing and
analyze the source code. processing the high level
 Over all execution speed is language code.
less.  Overall execution time is
faster.

7
Programs related to compilers…
 Interpreter…

 Well-known examples of interpreters:


 Basic interpreter, Lisp interpreter, UNIX shell command
interpreter, SQL interpreter, java interpreter…
 In principle, any programming language can be either
interpreted or compiled:
 Some languages are designed to be interpreted, others are
designed to be compiled
 Interpreters involve large overheads:
 Execution speed degradation can vary from 10:1 to 100:1
 Substantial space overhead may be involved

8
E.g., Compiling Java Programs
 The Java compiler produces bytecode not machine code
 Bytecode is converted into machine code using a Java
Interpreter
 You can run bytecode on any computer that has a Java
Interpreter installed

Win

ter
e
r pr
te
Java Program Java bytecode In
Mac
compiler Interpreter

Inte
rpre
ter
Unix

9
Programs related to compiler…
 Assemblers
 Translator for the assembly language.
 Assembly code is translated into machine code
 Output is relocatable machine code.
 Linker
 Links object files separately compiled or
assembled
 Links object files to standard library functions
 Generates a file that can be loaded and
executed

10
Programs related to compiler…
 Loader
 Loading of the executable codes, which are the
outputs of linker, into main memory.
 Pre-processors
 A pre-processor is a separate program that is called
by the compiler before actual translation begins.
 Such a pre-processor:
• Produce input to a compiler
• can delete comments,
• Macro processing (substitutions)
• include other files...

11
Programs related to compiler
C or C++ program

Preprocessor

C or C++ program with


macro substitutions and
file inclusions
Compiler

Assembly code
Assembler

Relocatable object
module
Other relocatable Linker
object modules or
library modules Executable code
Loader
Absolute machine code
12
The translation process
 A compiler consists of internally of a number of steps,
or phases, that perform distinct logical operations.
 The phases of a compiler are shown in the next slide,
together with three auxiliary components that
interact with some or all of the phases:
 The symbol table,
 the literal table,
 and error handler.

 There are two important parts in compilation process:


 Analysis and
 Synthesis.

13
The translation process…
Source code
Intermediate code
Literal Scanner generator
table
Intermediate
Tokens code
Symbol Intermediate code
table Parser optimizer

Intermediate
Syntax tree code
Error Target code
handler generator
Semantic
analyzer Target
code
Target code
Annotated optimizer
tree
Target
code
14
Analysis and Synthesis
Analysis (Front end)
Breaks up the source program into constituent pieces and
Creates an intermediate representation of the source
program.
During analysis, the operations implied by the source
program are determined and recorded in hierarchical
structure called a tree.
Synthesis (Back end)
The synthesis part constructs the desired program from
the intermediate representation.

15
Analysis of the source program

 Analysis consists of three phases:


 Lexical analysis
 Syntax analysis
 Semantic analysis

16
1. Lexical Analysis or Scanning
 The stream of characters making up the source program is
read from left to right and is grouped into tokens.
 A token is a sequence of characters having a collective
meaning.
 A lexical analyzer, also called a lexer or a scanner,
receives a stream of characters from the source program
and groups them into tokens.
 Examples: Source Lexical Streams of
program analyzer tokens
• Identifiers
• Keywords
• Symbols (+, -, …)
• Numbers …
 Blanks, new lines, tabulation marks will be removed during
lexical analysis.

17
Lexical analysis or Scanning…
 Example
a[index] = 4 + 2;
a identifier
[ left bracket
index identifier
] right bracket
= assignment operator Tokens
4 number
+ plus operator
2 number
; semicolon
 A scanner may perform other operations along with the
recognition of tokens.
• It may inter identifiers into the symbol table, and
• It may inter literals into literal table.
18
2. Syntax Analysis or Parsing

 The parser receives the source code in the form of tokens


from the scanner and performs syntax analysis.
 The results of syntax analysis are usually represented by a
parse tree or a syntax tree.
 Syntax tree  each interior node represents an operation
and the children of the node represent the arguments of
the operation.
 The syntactic structure of a programming language is
determined by context free grammar (CFG).

Stream of Syntax Abstract


tokens analyzer syntax tree

19
Syntax analysis or Parsing…
Context-free syntax is specified with a
grammar

Formally, a grammar G = (S,N,T,P)


 S is the start symbol
 N is a set of non-terminal symbols
 T is a set of terminal symbols or words
 P is a set of productions or rewrite rules
(P : N  N T )

20
Syntax analysis or Parsing…
Context-free syntax can be put to better use
1. Goal  Expr
2. Expr  Expr Op Term
3. | Term S = Goal
4. Term number T = { number, id, +, - }
5. | id
N = { Goal, Expr, Term, Op }
6. Op +
P = { 1, 2, 3, 4, 5, 6, 7 }
7. | -

 This grammar defines simple expressions with


addition & subtraction over “number” and “id”
 This grammar, like many, falls in a class called
“context-free grammars”, abbreviated CFG
21
Syntax analysis or Parsing…
Given a CFG, we can derive sentences by repeated substitution

1. Goal  Expr
Production Result
Goal 2. Expr  Expr Op Term
1 Expr 3. | Term
2 Expr Op Term 4. Term number
5 Expr Op y 5. | id
7 Expr - y 6. Op +
2 Expr Op term - y 7. | -
4 Expr Op 2 - y
6 Expr + 2 - y A derivation
3 Term + 2 - y
5 x + 2 - y
To recognize a valid sentence in some CFG, we reverse this process and
build up a parse

22
Syntax analysis or Parsing…
A parse can be represented by a tree (parse
tree or syntax tree) Goal

x + 2 - y Expr

Expr Op Term

Expr Op Term - <id,y>

Term + <number,2> 1. Goal  Expr


2. Expr  Expr Op Term
3. | Term
<id,x> 4. Term number
5. | id
6. Op +
The parse tree contains a lot 7. | -
of unneeded information
23
Syntax analysis or Parsing…
Compilers often use an abstract syntax tree instead of
a parse tree

-
The AST summarizes
grammatical structure,
without including detail
+ <id,y> about the derivation

<id,x> <number,2>

This is much more concise

ASTs are one kind of intermediate representation (IR)


24
3. Semantic Analysis
 The semantics of a program are its meaning as opposed
to syntax or structure
 The semantics consist of:
 Runtime semantics – behavior of program at runtime
 Static semantics – checked by the compiler
 Static semantics include:
 Declarations of variables and constants before use
 Calling functions that exist (predefined in a library or defined
by the user)
 Passing parameters properly
 Type checking.

 The semantic analyzer does the following:


 Checks the static semantics of the language
 Annotates the syntax tree with type information
25
Synthesis of the target program

 Intermediate code generator/Representation


 Intermediate code optimizer
 The target code generator
 The target code optimizer

26
Intermediate Code Generator/Representation
 Comes after syntax and semantic analysis
 Separates the compiler front end from its backend
 Intermediate representation should have 2 important
properties:
 Should be easy to produce
 Should be easy to translate into the target program
 Intermediate representation can have a variety of forms:
 Three-address code, AST, or DAG representation

Intermediate code
Abstract syntax Intermediate code
generator

 Three address code for the original C expression a[index]=4+2 is:


t1=2
t2 = 4 + t1
a[index] = t2 27
Code Generator
 The machine code generator receives the (optimized)
intermediate code, and then it produces either:
 Machine code for a specific machine, or
 Assembly code for a specific machine and assembler.
 Code Generator
 Selects appropriate machine instructions – Instruction Selection
 Allocates memory locations for variables – Register Allocation
 Allocates registers for intermediate computations – Instruction
Scheduling

28
Code generator…
 The code generator takes the IR code and generates code for
the target machine.

 Here we will write target code in assembly language:

load @b  r1
abxc+d load @c  r2
becomes computes
ef+bxc+d mult r1,r2  r3
bxc+d
load @d  r4
add r3,r4  r5
store r5  @a reuses
load @f  r6 bxc+d
add r5,r6  r7
store r7  @e

29
The Optimizer (or Middle End)
Traditional Three-part Compiler
Source Front IR Optimizer IR Back Machine
Code End (Middle End) End code

Errors

Code Improvement (or Optimization)


 Analyzes IR and rewrites (or transforms) IR
 Primary goal is to reduce running time of the
compiled code
 May also improve space, power consumption, …
 Must preserve “meaning” of the code
 Measured by values of named variables

30
The Optimizer (or Middle End)
IR Opt IR Opt IR Opt IR
... Opt IR
1 2 3 n

Errors
Modern optimizers are structured as a series of passes

Typical Transformations
 Discover & propagate some constant value
 Move a computation to a less frequently executed place
 Specialize some computation based on context
 Discover a redundant computation & remove it
 Remove useless or unreachable code
 Encode an idiom in some particularly efficient form

31
Major Data and Structures in a Compiler…

 Symbol Table
 Keeps information associated with all kinds of tokens:

• Identifiers, numbers, variables, functions, parameters, types, fields,


etc.
 Tokens are entered by the scanner and parser
 Semantic analyzer adds type information and other
attributes
 Code generation and optimization phases use the
information in the symbol table
Performance Issues
 Insertion, deletion, and search operations need to be
efficient because they are frequent
 More than one symbol table may be used

32
Major Data and Structures in a Compiler…
 Literal Table
 Stores constant values and string literals in a
program.
 One literal table applies globally to the entire
program.
 Used by the code generator to:
• Assign addresses for literals.
 Avoids the replication of constants and strings.
 Quick insertion and lookup are essential.

33
Compiler construction tools
 Various tools are used in the construction of the
various parts of a compiler.
 Scanner generators
 Ex. Lex, flex, JLex
 These tools generate a scanner /lexical
analyzer/ if given a regular expression.
 Parser Generators
 Ex. Yacc, Bison
 These tools produce a parser /syntax analyzer/
if given a Context Free Grammar (CFG) that
describes the syntax of the source language.

34
Compiler construction tools…
 Syntax directed translation engines
 Ex. Cornell Synthesizer Generator
 It produces a collection of routines that walk
the parse tree and execute some tasks.
 Automatic code generators
 Take a collection of rules that define the
translation of the IC to target code and
produce a code generator.
 This completes our brief description of the
phases of compiler.

35

You might also like