CS 346: Code Generation
Resource: Textbook
Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman,
“Compilers: Principles,Techniques, and Tools”, Addison-
Wesley, 1986.
Compiler Architecture
Intermediate
Intermediate
Language
Language
Scanner Parser Semantic Code
Source Code Target
language
(lexical (syntax Analysis Generator language
Optimizer
analysis) analysis) (IC generator)
tokens Syntactic
structure
Symbol
Table
Code Generator
Severe requirements imposed
Output must be correct and of high quality
Generator should run efficiently
Generating optimal code is undecidable
Must rely on heuristic
Choice of heuristic-very important
Details depends on
target language
operating system
Certain generic issues are inherent in the design of basically all
code generators
Input to Code Generator
Input to the code generator consists of:
Intermediate code produced by the front end (and perhaps
optimizer)
Remember that intermediate code can come in many forms
Three-address codes are more popular but several
techniques apply to other possibilities as well
Information in the symbol table
used to determine run-time addresses of data objects
Code generator typically assumes that:
Input is free of errors
Type checking has taken place and necessary type-conversion
operators have already been inserted
Output of Code Generator
Target program: output of the code generator
Various representations of target forms
absolute machine language
relocatable machine language
Can compile subprograms separately
Added expense of linking and loading
assembly language
Makes task of code generation simpler
Added cost of assembly phase
Memory Management
Compiler must map names in source code to addresses of data
objects at run-time
Done cooperatively by front-end and code generator
Generator uses information in symbol table
For the generated machine code:
Labels in three-address statements need to be converted to addresses
of instructions
Process is analogous to backpatching
Instruction Selection
Complexities
The level of intermediate representation
Nature of the instruction-set architecture
Desired quality of the generated code
Intermediate representation (high level)
Translate each intermediate statement into a sequence of machine
instructions
Statement-statement code generation is not efficient
Needs further optimization
Intermediate representation (low level)
Low-level details can be useful for generating efficient codes
Instruction Selection
If efficiency is not a concern, instruction selection is straight-
forward
For each type of three-address statement, there is a code skeleton that
outlines target code
Example, x := y + z, where x, y, and z are statically located,
can be translated as:
MOV y, R0 /* load y into register r0 */
ADD z, R0 /* add z to r0 */
MOV R0, x /* store R0 into x */
Instruction Selection
Often, the straight-forward technique produces poor code:
MOV b, R0 Redundant
a := b + c ADD c, R0 statements:
d := a + e MOV R0, a 4th and 3rd
MOV a, R0
ADD e, R0
MOV R0, d
A naïve translation may lead to correct but unacceptably
inefficient target code:
MOV a, R0 INC a
a := a + 1
ADD #1, R0
MOV R0, a
Register Allocation
Instructions are usually faster if operands are in registers instead
of memory
Efficient utilization of registers is important in generating
good code
Register allocation selects the set of variables that will reside in
registers
A register assignment phase picks the specific register in which a
variable resides
Finding an optimal assignment of registers to variables is
difficult
Problem is NP-complete
Certain machines require register-pairs for certain
operators and results
Address Modes of Sample Machine
MODE FORM ADDRESS ADDED COST
absolute M M 1
Register R R 0
Indexed c(R) c + contents(R) 1
indirect
*R contents(R) 0
register
indirect contents(c +
*c(R) 1
indexed contents(R))
MODE FORM CONSTANT ADDED COST
literal #c c 1
Instruction Costs
Cost of instruction for sample computer:
1+ costs associated with the source and destination address modes
Corresponds to the length (in words) of the instruction
Address modes involving registers have cost zero
With memory location or literal have cost one since operands have
to be stored with instruction
Time taken to fetch instruction often exceeds time spent executing
instruction
Example: a := b + c
The following solution has cost 6:
MOV b, R0
ADD c, RO
MOV R0, a
The following solution also has cost 6:
MOV b, a
ADD c, a
If R0, R1, and R2 contain the addresses of a, b, and c,
respectively, the cost can be reduced to 2:
MOV *R1, *R0
ADD *R2, *R0
If R1 and R2 contain the values of b and c, respectively, and b is
not needed again, the cost can be reduced to 3:
ADD R2, R1
MOV R1, a
Memory organization during program execution
Main memory
Code area
Store large amount of data
Slower access
Global/static area
Registers
stack Very small amount of data
Data area
Faster access
Free space
registers
Heap
Main memory
Code Area
Addresses in code area are
static (i.e. no change during
execution) for most entry point
programming languages entry point
proc 1
proc 2
Addresses are known at entry point
compile time proc n
Data Area
Addresses in data area are static for some data and dynamic
for others
Static data are located in static area
Dynamic data are located in stack or heap
Stack (LIFO allocation) for procedure activation record,
etc.
Heap for user allocated memory, etc.
Registers
General-purpose registers
Used for calculation
Special purpose registers
Program counter (pc)
Stack pointer (sp)
Frame pointer (fp)
Argument pointer (ap)
Addresses in Target Code (partitions)
Program runs in own logical address
Partitions of logical addresses
Code
Statically determined area that holds executable target code
Size of target code determined at compile time
Static
Holds global constants and other data
Size of these entities determined at compile time
Addresses in Target Code
Heap
Holds data objects created and freed during program
execution
Size of heap can’t be determined at compile time
Stack
Dynamically managed area for activation records (created
and destroyed during procedure calls and returns)
Size cant be determined at compile time
Activation Records
Activation records store information needed during the execution of a
procedure
Two possible storage-allocation strategies for activation records:
static allocation (decision can be taken by looking at program)
stack allocation (decision taken at run time)
An activation record has fields which hold:
result and parameters
machine-status information
local data and temporaries
Size and layout of activation records are communicated to code
generator via information in the symbol table
Sample Activation Records
Assume that run-time memory has areas for code, static data,
and optionally a stack
Heap is not being used in these examples
Static Allocation
A call statement in the intermediate code is implemented by
two target-machine instructions:
MOV #here+20, callee.static_area
GOTO callee.code_area
MOV instruction saves the return address
GOTO statement transfers control to the target code of the called
procedure
A return statement in the intermediate code is implemented
by one target-machine instruction:
GOTO *callee.static_area
Sample Code
/* code for c */
100: action1
120: MOV #140, 364 /* save return address 140 */
132: GOTO 200 /* call p */
140: action2
160: HALT
…
/* code for p */
200: action3
220: GOTO *364
…
/* 300-363: activation record for c */
300: /* return address */
304: /* local data for c */
…
/* 364–451: activation record for p */
364: /* return address */
368: /* local data for p */
Stack Allocation
Stack allocation uses relative addresses for storage in
activation records
address can be offset from any known position in activation record
uses positive offset from SP, a pointer to beginning of activation
record at top of stack
Position of an activation record is not known until run-time
position usually stored in a register
indexed address mode is convenient
Stack Allocation
The code for the first procedure initializes the stack:
LD SP, #stackstart //intialize the stack
…code for the first procedure…
HALT
A procedure call increments SP, saves the return address, and transfers
control:
ADD SP, #caller.recordsize //increment SP
ST 0(SP), #here+16 //save return address
BR callee.code_area //jump to the calle
A return sequence has two parts:
First control is transferred to the return address
BR *0(SP) //return to caller
Next, in the caller, SP is restored to its previous value
SUB SP, #caller.recordsize
Stack Allocation
action1 //code for m
call q
action2
halt
action3 //code for p
return
action4 //code for q
call p
action5
call q
action6
call q
return
Stack Allocation
msize: size of activation record for procedure m (20)
psize: size of activation record for procedure p (40)
qsize: size of activation record for procedure q (60)
First word in each activation record holds return address
Code for m, p and q start at addresses 100, 200 and 300
Stack starts at address 600
Stack Allocation
//Code for m
100: LD SP, #600 //Initializes STACK
108: ACTION1 //code for action 1
128: ADD SP, #msize //call sequence begins
136: ST O(SP), #152 //push return address
144: BR 300 // call q
152: SUB SP, #msize //restore SP
160: ACTION2
180: HALT
…
….
Stack Allocation
//Code for p
200: ACTION3
220: BR *O(SP) //return
….
…
//code for q
300: ACTION4
320: ADD SP, #qsize //call sequence begins
328: ST O(SP), #344 //push return address
336: BR 200 // call p
344: SUB SP, #qsize //restore SP
352: ACTION5
372: ADD SP, #qsize
…
….
Stack Allocation
380: ST 0(SP), #396 //push return address
388: BR 300 //call q
396: SUB SP, #qsize //restore SP
404: ACTION6
424: ADD SP, #qsize
432: ST 0(SP), #448 //push return address
440: BR 300
448: SUB SP, #qsize
456: BR *0(SP) //return
…..
….
600: //stack starts here
Basic Blocks
A basic block is a sequence of statements such that:
Flow of control enters at start
Flow of control leaves at end
No possibility of halting or branching except at end
A name is "live" at a given point if its value will be used
again in the program
Each basic block has a first statement known as the "leader"
of the basic block
Partitioning Code into Basic Blocks
Determination of leaders:
The first statement is a leader
Any statement that is the target of a conditional or unconditional goto is a
leader
Any statement immediately following a goto or unconditional goto is a
leader
A basic block:
Starts with a leader
Includes all statements up to but not including the next leader
Basic Block Example
(1) prod := 0
begin (2) i := 1
prod := 0; (3) t1 := 4 * i
i := 1; (4) t2 := a[t1]
do (5) t3 := 4 * i
begin (6) t4 := b[t3]
prod := prod + a[i] * b[i] (7) t5 := t2 * t4
end (8) t6 := prod + t5
while i <= 20 (9) prod := t6
end (10) t7 := i + 1
(11) i := t7
(12) if i <= 20 goto 3
(13) …
Transformations on Basic Blocks
Basic block computes a set of expressions
Expressions: values of names that are live on exit from the
block
Two basic blocks are equivalent if they compute the same
set of expressions
Certain transformations can be applied without changing the
computed expressions of a block
Optimizer uses such transformations to improve running
time or space requirements of a program
Important classes of local transformations:
structure-preserving transformations
algebraic transformations
Example Transformations (1)
Algebraic Transformations:
Statements such as x := x + 0 or x := x * 1 can be safely removed
Statement x := y ^ 2 can be safely changed to x := y * y
Structure preserving transformations
Common sub-expression elimination
a := b + c a := b + c
b := a – d b := a – d
c := b + c c := b + c
d := a - d d := b
-Dead-code elimination
Suppose the statement x := y + z appears in a basic block and x is dead
(i.e. never used again)
This statement can be safely removed from the code
Example Transformations (2)
Renaming of Temporary variables
Statement t := b + c appears in a basic block and t is a
temporary
We can safely rename all instances of this t to u, where u is a new
temporary
A normal-form block uses a new temporary for every statement
that defines a temporary
Interchange of two independent adjacent statements
Suppose we have a block with two adjacent statements:
t1 := b + c
t2 := x + y
If t1 is distinct from x and y and t2 is distinct from b and c, we
can safely change the order of these two statements
Next-Use Information
Defining use of a name in a three-address statement :
Suppose statement i assigns a value to x
Suppose statement j has x as an operand
We say that j uses the value of x computed at i if:
there is a path through which control can flow from
statement i to statement j
path has no intervening assignments to x
For each three-address statement that is of the form x := y op z
We want to determine the next uses of x, y, and z
May be within or outside the current basic block
Why next use information?
Knowing when the value of variable will be used next is essential for
generating good code
If value of variable stored in a register is never used then register can
be freed for use of other’s
Determine Next-Use
Scan each basic block from end to beginning
At start, record, if known, which names are live on exit from that
block
Otherwise, assume all non-temporaries are live
If algorithm allows certain temporaries to be live on exit from
block, consider them live as well
Whenever reaching a three address statement at line i with the form
x := y op z
Attach statement i to information in symbol table regarding the
next use and liveliness of x, y, and z
Next, in symbol table, set x to "not live" and "no next use"
Then set y and z to "live" and the next uses of y and z to I
A similar approach is taken for unary operators
Flow Graphs
A graphical representation of three-address statements, useful for
optimization
Nodes represent computations
Each node represents a single basic block
One node is distinguished as initial
Edges represent flow-of-control
An edge exists from B1 to B2 if and only if B2 can immediately
follow B1 in some execution sequence:
True if there is an conditional or unconditional jump from the
last statement of B1 to the first statement of B2
True if B2 immediately follows B1 and B1 does not end with
an unconditional jump
B1 is a predecessor of B2, B2 is a successor of B1
Flow Graph (Example)
for i from 1 to 10
for j from 1 to 10 do
a[i ,j]=0.0;
for i from 1 to 10 do
a[i, i]=1.0;
Flow Graph
1. i=1 12. i=1
2. J=1 13. t5=i-1
3. t1=10*I 14. t6=88*t5
4. t2=t1+j 15. a[t6]=1.0
5. t3=8*t2 16. i=i+1
6. t4=t3-88 17. if i<=10 goto 13
7. a[t4]=0.0
8. j=j+1
9. If j<=10 goto 3
10. i=i+1
11. If i<=10 goto 2
Intermediate code to set a 10 X 10 matrix to an identity matrix
Leaders and Basic Blocks
Leaders
1 (first instruction)
2,3 and 13 (target of jumps)
10 and 12 (following instructions of jumps)
Basic blocks
B1: 1
B2: 2
B3: 3-9
B4: 10-11
B5: 12
B6: 13-17
Flow Graph
Representations of Flow Graphs
A basic block can be represented by a record consisting of:
A count of the number of quadruples in the block
Or, a pointer to the last instruction
A pointer to the leader of the block
The list of predecessors and successors
Alternative is to use linked list of quadruples
More efficient as number of instructions in basic block changes
Explicit references to quadruple numbers in jump statements can cause
problems:
Quadruples can be moved during optimization
Better to have jumps points to blocks
Loops and Flow Graphs
Every program spends most of its time in executing its loops
(while, do-while, for etc.)
Important for a compiler to generate good codes for loops
Set of nodes-L
Any node e called the loop entry such that
1. e is not ENTRY, entry of the entire flow graph
2. No node in L besides e has a predecessor outside L
-i.e. every path from ENTRY to any node in L goes through e
3. Every node in L has a nonempty path, completely within L, to e.
Reusing Temporaries
Convenient during optimization for every temporary to have its own
name
Space can be saved
Two temporaries can be packed into the same location if they are not
live simultaneously
t1 := a * a t1 := a * a
t2 := a * b t2 := a * b
t3 := 2 * t2 t3 := 2 * t2
t4 := t1 + t3 t1 := t1 + t3
t5 := b * b t2 := b * b
t6 := t4 + t5 t1 := t1 + t2
Directed Acyclic Graphs (DAGS)
DAG: Represents basic block
Directed acyclic graph such that:
leaves represent the initial values of name
Labeled by unique identifiers, either variable names or
constants
Operator applied to name determines if l-value (address) or r-
value (contents) is needed; usually it is r-value
Interior nodes are labeled by the operators
Nodes are optionally also given a sequence of identifiers
(identifiers that are assigned their values)
Useful for implementing transformations on and determining
information about a basic block
DAG Example
Using DAGS
DAG can be automatically constructed from code using a simple
algorithm
Several useful pieces of information can be obtained:
Common sub-expressions
Which identifiers have their values used in the block?
Which statements compute values that could be used outside
the block?
Can be used to reconstruct a simplified list of quadruples
Can evaluate interior nodes of DAG in any order that is a
topological sort (all children before parent)
Heuristics exist to find good orders
If DAG is a tree, a simple algorithm exists to give optimal
order (order leading to shortest instruction sequence)
Generating Code from DAGS (1)
t1 := a + b t2 := c + d
t2 := c + d t3 := e – t2
t3 := e – t2 t1 := a + b
t4 := t1 – t3 t4 := t1 – t3
Generating Code from DAGS (2)
Assume only t4 is live on exit of previous example and two registers
(R0 and R1) exist
Code generation algorithm discussed earlier leads to following
solutions (second saves two instructions)
MOV a, R0 MOV c, R0
ADD b, R0 ADD d, R0
MOV c, R1 MOV e, R1
ADD d, R1 SUB R0, R1
MOV R0, t1 MOV a, R0
MOV e, R0 ADD b, R0
SUB R1, R0 SUB R1, R0
MOV t1, R1 MOV R0, t4
SUB R0, R1
MOV R1, t4