0% found this document useful (0 votes)
24 views

Lecture 2

The document discusses the process of compilation, describing how compilers translate programs from high-level source code to low-level machine code. It explains the major phases of a compiler including lexical analysis, parsing, semantic analysis, code optimization, code generation, and the use of symbol tables and intermediate representations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Lecture 2

The document discusses the process of compilation, describing how compilers translate programs from high-level source code to low-level machine code. It explains the major phases of a compiler including lexical analysis, parsing, semantic analysis, code optimization, code generation, and the use of symbol tables and intermediate representations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

What are Compilers?

• Translates from one representation of the program to


another

• Typically from high level source code to low level


machine code or object code

• Source code is normally optimized for human


readability
– Expressive: matches our notion of languages (and
application?!)
– Redundant to help avoid programming errors

• Machine code is optimized for hardware


– Redundancy is reduced
1
– Information about the intent is lost
High level Low level
Compiler code
program

Compiler as a Translator

2
Goals of translation
• Good compile time performance
• Good performance for the
generated code
• Correctness
– A very important issue.
–Can compilers be proven to be
correct?
• Tedious even for toy compilers!
Undecidable in general.
–However, the correctness has an
implication on the development cost
3
How to translate?
• Direct translation is difficult. Why?

• Source code and machine code mismatch in


level of abstraction
– Variables vs Memory locations/registers
– Functions vs jump/return
– Parameter passing
– structs
• Some languages are farther from machine
code than others
– For example, languages supporting Object
Oriented Paradigm
4
How to translate easily?
• Translate in steps. Each step handles a
reasonably simple, logical, and well defined
task
• Design a series of program representations
• Intermediate representations should be
amenable to program manipulation of
various kinds (type checking, optimization,
code generation etc.)
• Representations become more machine
specific and less language specific as the
translation proceeds 5
The first few steps
• The first few steps can be understood
by analogies to how humans
comprehend a natural language
• The first step is recognizing/knowing alphabets of a
language. For example
– English text consists of lower and upper case
alphabets, digits, punctuations and white spaces
– Written programs consist of characters from the
ASCII characters set (normally 9-13, 32-126)

6
The first few steps
• The next step to understand the sentence
is recognizing words
– How to recognize English words?
– Words found in standard dictionaries
– Dictionaries are updated regularly

7
The first few steps
• How to recognize words in a
programming language?
– a dictionary (of keywords
etc.)
– rules for constructing words (identifiers,
numbers etc.)
• This is called lexical analysis
• Recognizing words is not completely
trivial. For example:
what is this sentence?
8
Lexical Analysis: Challenges
• We must know what the word
separators are

• The language must define rules for


breaking a sentence into a sequence of
words.

• Normally white spaces and


punctuations are word separators in
languages.

9
Lexical Analysis: Challenges
• In programming languages a character
from a different class may also be
treated as word separator.

• The lexical analyzer breaks a sentence


into a sequence of words or tokens:
– If a == b then a = 1 ; else a = 2 ;
– Sequence of words (total 14 words)

10
The next step
• Once the words are understood, the next
step is to understand the structure of
the sentence

• The process is known as syntax checking


or
parsing

11
Parsing
• Parsing a program is exactly the same
process as shown in previous slide.
• Consider an expression
if x == y then z = 1 else z = 2

12
Understanding the meaning
• Once the sentence structure is
understood we try to understand the
meaning of the sentence (semantic
analysis)
• A challenging task
• Example:
Qasim said hashir left his assignment at
home
• What does his refer to? Qasim or Hashir?

13
Understanding the meaning
• Worse case
Qasim said Qasim left his
assignment at home
• Even worse
Qasim said Qasim left Qasim’s
assignment at home
• How many Qasim are there?
Which one left the assignment?
Whose assignment got left?
14
Semantic Analysis
• Too hard for compilers. They do not have
capabilities similar to human understanding
• However, compilers do perform analysis to
understand the meaning and catch
inconsistencies
• Programming languages define strict rules to
avoid such ambiguities
{ int Qasim = 3;
{ int Qasim = 4;
cout << Qasim;
}
}

15
More on Semantic Analysis
• Compilers perform many other checks
besides variable bindings
• Type checking
Qasim left her work at home
• There is a type mismatch between her
and Qasim. Presumably Qasim is a
male. And they are not the same
person.
16
Compiler structure once again

18
Code Optimization
• No strong counter part with
English, but is similar to
editing/précise writing

• Automatically modify programs so


that they
–Run faster
–Use less resources (memory,
registers, space, fewer fetches etc.)

23
Code Optimization
• Some common optimizations
–Common sub-expression elimination
–Copy propagation
–Dead code elimination
–Code motion
–Strength reduction
–Constant folding

• Example: x = 15 * 3 is transformed
to x = 45
24
Example of Optimizations
A : assignment M : multiplication D : division E : exponent

25
Code Generation
• Usually a two step process
– Generate intermediate code from the
semantic representation of the program
– Generate machine code from the
intermediate code

• The advantage is that each phase is


simple

• Requires design of intermediate


language
26
Code Generation
• Most compilers perform translation
between successive intermediate
representations

• Intermediate languages are generally


ordered in decreasing level of abstraction
from highest (source) to lowest (machine)

27
Code Generation
• Abstractions at the source level
identifiers, operators, expressions, statements,
conditionals, iteration, functions (user defined,
system defined or libraries)
• Abstraction at the target level
memory locations, registers, stack, opcodes,
addressing modes, system libraries, interface to
the operating systems

• Code generation is mapping from source level


abstractions to target machine abstractions

28
Code Generation
• Map identifiers to locations
(memory/storage allocation)
• Explicate variable accesses (change
identifier reference to
relocatable/absolute address
• Map source operators to opcodes
or a sequence of opcodes

29
Code Generation

• Convert conditionals and iterations to a


test/jump or compare instructions
• Layout parameter passing protocols:
locations for parameters, return values
• Interface calls to library, runtime system,
operating systems

30
Post translation Optimizations

• Algebraic transformations and


reordering
– Remove/simplify operations
like
• Multiplication by 1
• Multiplication by 0
• Addition with 0

– Reorder instructions based on


• Commutative properties of
operators
31
• For example x+y is same as y+x
Post translation Optimizations

Instruction selection
– Addressing mode selection
– Opcode selection
– Peephole optimization

32
Compiler structure

34
Something is missing
• Information required about the program variables during
compilation
– Class of variable: keyword, identifier etc.
– Type of variable: integer, float, array, function etc.
– Amount of storage required
– Address in the memory
– Scope information
• Location to store this information
– Attributes with the variable (has obvious problems)
– At a central repository and every phase refers to the repository
whenever information is required
• Normally the second approach is preferred
– Use a data structure called symbol table

35
Final Compiler structure

36
Advantages of the model
• Also known as Analysis-Synthesis model of
compilation
– Front end phases are known as analysis
phases
– Back end phases are known as synthesis
phases

• Each phase has a well defined work

• Each phase handles a logical activity in the


process of compilation
37
Advantages of the model


Compiler is re-targetable

• Source and machine independent code optimization


is possible.

• Optimization phase can be inserted after the front


and back end phases have been developed and
deployed

38

You might also like