Lecture 3:
Compiler Construction
CS 401
DR. ABDUL MAJID, DCIS
Previous Lecture 2
IN PREVIOUS WE DISCUSSED
•TWO MAIN PART: FRONT END AND
BACK END
•COMPILER TYPES: TWO-PASS COMPILER
•COMPILER ANALYSIS: FRONT END AND BACK END
•COMPILER COMPORTS: SCANNER : LEX. REG EXP, TOKEN,
• LEXEMES
•PARSER: CONTEXT-FREE GRAMMARS
•ABSTRACT SYNTAX TREES
•BACK END COMPILER
Today Lecture 3 Contents
• COMPILER TYPES: THREE-PASS COMPILER
• OPTIMIZER
• LEXICAL ANALYSIS, TOKENS, AD-HOC LEXER
Three-pass Compiler 4
Source Front IR Middle IR Back machine
code End End End code
errors
Intermediatestage for code
improvement or optimization
Three-pass Compiler 5
Source Front IR Middle IR Back machine
code End End End code
errors
Analyzes IR and rewrites (or
transforms) IR
Three-pass Compiler 6
Source Front IR Middle IR Back machine
code End End End code
errors
Primary goal is to reduce running
time of the compiled code
Three-pass Compiler 7
Source Front IR Middle IR Back machine
code End End End code
errors
May also improve space usage,
power consumption, ...
Three-pass Compiler 8
Source Front IR Middle IR Back machine
code End End End code
errors
Must preserve “meaning” of the code.
Measured by values of named variables
Optimizer 9
IR
IR Opt IR Opt IR Opt Opt IR
1 2 3 n
errors
Modern optimizers are structured
as a series of passes
Optimizer 10
IR
IR Opt IR Opt IR Opt Opt IR
1 2 3 n
errors
Typical transformations
Discover & propagate some
constant value
Optimizer 11
IR
IR Opt IR Opt IR Opt Opt IR
1 2 3 n
errors
Typical transformations
Move a computation to a less
frequently executed place
Optimizer 12
IR
IR Opt IR Opt IR Opt Opt IR
1 2 3 n
errors
Typical transformations
Specialize some computation based
on context
Optimizer 13
IR
IR Opt IR Opt IR Opt Opt IR
1 2 3 n
errors
Typical transformations
Discover a redundant
computation & remove it
Optimizer 14
IR
IR Opt IR Opt IR Opt Opt IR
1 2 3 n
errors
Typical transformations
Remove useless or unreachable
code
Optimizer 15
IR
IR Opt IR Opt IR Opt Opt IR
1 2 3 n
errors
Typical transformations
Encode an idiom in some
particularly efficient form
Role of Run-time System 16
Memory management
Allocate/deallocate
Garbage collection
Run-time type checking
Error/exception processing
Interface to OS – I/O
Support for parallelism
Parallel threads
Communication and synchronization
Related to Compilers 17
Interpreters (direct execution)
Assemblers
Preprocessors
Text formatters (non-WYSIWYG)
Analysis tools
Lexical Analysis
Recall: Front-End 19
source tokens IR
scanner parser
code
errors
Output of lexical analysis is a
stream of tokens
Tokens 20
Example:
if( i == j )
z = 0;
else
z = 1;
Tokens 21
Input is just a sequence of characters:
i f ( \b i \b = = \b j \n \t ....
Tokens 22
Goal:
partition input string into
substrings
classify them according to their
role
Tokens 23
A token is a syntactic
category
Natural language:
“He wrote the program”
Words: “He”, “wrote”, “the”,
“program”
Tokens 24
Programming language:
“if(b == 0) a = b”
Words:
“if”, “(”, “b”, “==”, “0”,
“)”, “a”, “=”, “b”
Tokens 25
Identifiers:x y11 maxsize
Keywords: if else while for
Integers: 2 1000 -44 5L
Floats: 2.0 0.0034 1e5
Symbols: ( ) + * / { } < > ==
Strings: “enter x” “error”
Ad-hoc Lexer 26
Hand-write code to generate
tokens.
Partition the input string by
reading left-to-right,
recognizing one token at a time
Ad-hoc Lexer 27
Look-ahead required to decide
where one token ends and the
next token begins.
Ad-hoc Lexer 28
class Lexer
{
Inputstream s;
char next;//look ahead
Lexer(Inputstream _s)
{
s = _s;
next = s.read();
}
Ad-hoc Lexer 29
class Lexer
{
Inputstream s;
char next;//look ahead
Lexer(Inputstream _s)
{
s = _s;
next = s.read();
}
Ad-hoc Lexer 30
class Lexer
{
Inputstream s;
char next;//look ahead
Lexer(Inputstream _s)
{
s = _s;
next = s.read();
}
Ad-hoc Lexer 31
class Lexer
{
Inputstream s;
char next;//look ahead
Lexer(Inputstream _s)
{
s = _s;
next = s.read();
}
Ad-hoc Lexer 32
class Lexer
{
Inputstream s;
char next;//look ahead
Lexer(Inputstream _s)
{
s = _s;
next = s.read();
}
Ad-hoc Lexer 33
Token nextToken() {
if( idChar(next) )
return readId();
if( number(next) )
return readNumber();
if( next == ‘”’ )
return readString();
...
...
Ad-hoc Lexer 34
Token nextToken() {
if( idChar(next) )
return readId();
if( number(next) )
return readNumber();
if( next == ‘”’ )
return readString();
...
...
Ad-hoc Lexer 35
Token nextToken() {
if( idChar(next) )
return readId();
if( number(next) )
return readNumber();
if( next == ‘”’ )
return readString();
...
...
Ad-hoc Lexer 36
Token nextToken() {
if( idChar(next) )
return readId();
if( number(next) )
return readNumber();
if( next == ‘”’ )
return readString();
...
...
Ad-hoc Lexer 37
Token readId() {
string id = “”;
while(true){
char c = input.read();
if(idChar(c) == false)
return
new Token(TID,id);
id = id + string(c);
}
}
Ad-hoc Lexer 38
Token readId() {
string id = “”;
while(true){
char c = input.read();
if(idChar(c) == false)
return
new Token(TID,id);
id = id + string(c);
}
}
Ad-hoc Lexer 39
Token readId() {
string id = “”;
while(true){
char c = input.read();
if(idChar(c) == false)
return
new Token(TID,id);
id = id + string(c);
}
}
Ad-hoc Lexer 40
Token readId() {
string id = “”;
while(true){
char c = input.read();
if(idChar(c) == false)
return
new Token(TID,id);
id = id + string(c);
}
}
Ad-hoc Lexer 41
Token readId() {
string id = “”;
while(true){
char c = input.read();
if(idChar(c) == false)
return
new Token(TID,id);
id = id + string(c);
}
}
Ad-hoc Lexer 42
Token readId() {
string id = “”;
while(true){
char c = input.read();
if(idChar(c) == false)
return
new Token(TID,id);
id = id + string(c);
}
}
Ad-hoc Lexer 43
Token readId() {
string id = “”;
while(true){
char c = input.read();
if(idChar(c) == false)
return
new Token(TID,id);
id = id + string(c);
}
}
Ad-hoc Lexer 44
boolean idChar(char c)
{
if( isAlpha(c) )
return true;
if( isDigit(c) )
return true;
if( c == ‘_’ )
return true;
return false;
}
Ad-hoc Lexer 45
Token readNumber(){
string num = “”;
while(true){
next = input.read();
if( !isNumber(next))
return
new Token(TNUM,num);
num = num+string(next);
}
}
Ad-hoc Lexer 46
Token readNumber(){
string num = “”;
while(true){
next = input.read();
if( !isNumber(next))
return
new Token(TNUM,num);
num = num+string(next);
}
}
Ad-hoc Lexer 47
Token readNumber(){
string num = “”;
while(true){
next = input.read();
if( !isNumber(next))
return
new Token(TNUM,num);
num = num+string(next);
}
}
Ad-hoc Lexer 48
Problems:
Do not know what kind of
token we are going to read
from seeing first character.
Ad-hoc Lexer 49
Problems:
If token begins with “i”, is it an
identifier “i” or keyword “if”?
If token begins with “=”, is it
“=” or “==”?
Ad-hoc Lexer 50
Need a more principled
approach
Use lexer generator that
generates efficient tokenizer
automatically.