Unit 6
Unit 6
Code Generation
Code optimization
Under this topic we cover machine-independent optimizations. Machine dependent optimizations, such
as register allocation and utilization of special machine-instruction sequences (machine idioms) are
covered under next topic “ Code generation “.
The principal sources of optimization: There are some useful code-improving transformations. A
transformation is called local if it can be performed by looking only at the statements in a basic block;
otherwise, it is called global. Usually, local transformations are done first.
Function-preserving transformations: Some function-preserving transformations are: 1. Common
subexpression elimination 2. Copy propagation 3. Dead-code elimination 4. Constant folding.
1. Common subexpression elimination: An expression E is called a common subexpression if E
was previously computed, and the values of variables in E have not changed since the previous
computation.
2. Copy propagation: Assignments of the form a:=b are called copy statements or copies. Copy
statements are created due to some optimization algorithms like algorithm for common
subexpression elimination. The idea behind copy-propagation transformation is to use b for a
wherever possible after copy statement a:=b.
3. Dead-code elimination: If a value of a variable is not used after a certain point, then it is dead at
that point. Similarly, dead code or useless code is the statements computing values, which never
gets used. Copy propagation often turns the copy statement into dead code.
4. Constant folding: If the value of an expression is constant at compile time, then constant can be
used instead of an expression. This is called constant folding.
Loop optimizations: Since programs spend most of their time in loop, especially inner loops, loops are
very important places for optimization. Some important loop optimization techniques are: 1. Code
motion 2. Induction-variable elimination 3. Reduction in strength.
1. Code motion: Loop-invariant computation (expression whose value does not change during loop
iterations) can be moved before the loop. This decreases the amount of code inside a loop.
2. Induction-variable elimination: If a:=b*4 is an assignment then every time if b increases by 1, a
increases by 4. Here a and b are induction variables. When there are two or more induction
variables in a loop, it may be possible to eliminate all except one.
3. Reduction in strength: Here a cheaper one replaces expensive operation. For ex. addition may
replace multiplication.
Basic block: Basic block is a sequence of consecutive statements in which flow of control enters at
the beginning and leaves at the end without halt or branching except at the end.
Flow graph: B1
i:=m-1
j:=n
t1:=4*n
v:=a[t1]
B2
i:=i+1
t2:=4*i
t3:=a[t2]
If t3<v goto B2
B3
j:=j-1
t4:=4*j
t5:=a[t4]
If t5>v goto B3
B4
If i>=j goto B6
B5 B6
t6:=4*i t11:=4*i
x:=a[t6] x:=a[t11]
t7:=4*i t12:=4*i
t8:=4*j t13:=4*n
t9:=a[t8] t14:=a[t13]
a[t7]:=t9 a[t12]:=t14
t10:=4*j t15:=4*n
a[t10]:=x a[t15]:=x
goto B2
Common subexpression elimination: After removing common subexpression 4*i and 4*j, block B5
becomes
t6:=4*I
x:=a[t6]
t8:=4*j
t9:=a[t8]
a[t6]:=t9
a[t8]:=x
goto B2
Now, using global common subexpression elimination 4*i and 4*j can be replaced by t2 and t4,
respectively. Therefore, B5 becomes:
x:=a[t2]
t9:=a[t4]
a[t2]:=t9
a[t4]:=x
goto B2
Now a[t2] and a[t4] can be replaced by t3 and t5, respectively. Finally, B5 becomes:
x:=t3
t9:=t5
a[t2]:=t9
a[t4]:=x
goto B2
x:=t3
t9:=t5
a[t2]:=t5
a[t4]:=t3
goto B2
A[t2]:=t5
A[t4]:=t3
Goto B2
x:=t3
t14:=a[t1]
a[t2]:=t14
a[t1]:=t3
After dead-code elimination B6 becomes:
t14:=a[t1]
a[t2]:=t14
a[t1]:=t3
In a loop consisting of B3 itself, j and t4 are induction variables. By applying, induction variable
elimination and reduction in strength, we replace assignment t4:=4*j by t4:=t4-4. We place an
initialization of t4 at the end of block where j itself is initialized. (We add t4:=4*j at the end of block
B1.) Similar transformation is done in block B2 also.
Now the only use of i and j is in a test in block B4, which can be replaced by t2>=t4. Now i and j
become dead variables and the code assigning values to them is also dead code.
Final flow graph:
i:=m-1
j:=n B1
t1:=4*n
v:=a[t1]
t2:=4*i
t4:=4*j
B2
t2:=t2+4
t3:=a[t2]
If t3<v goto B2
B3
t4:=t4-4
t5:=a[t4]
If t5>v goto B3
B4
t2>=t4 goto B6
B5 B6
a[t2]:=t5
a[t4]:=t3 t14:=a[t1]
goto B2 a[t2]:=t14
a[t1]:=t3
The dag representation of basic blocks: Directed acyclic graphs (dags) are useful data structures for
implementing transformation on basic blocks. Using dag, we can determine common subexpressions
within a block, determine names which are used inside the block but evaluated outside the block, and
determine which statements of the block could have their computed value used outside the block.
A dag for a basic block is directed acyclic graph with the following labels on nodes:
1. Leaves are labeled by unique identifiers, which are either variable names or constants.
2. Interior nodes are labeled by an operator symbol
3. Nodes are also optionally given a sequence of identifiers for labels.
Ex.
1. t1:=4*i
2. t2:=a[t1]
3. t3:=4*i
4. t4:=b[t3]
5. t5:=t2*t4
6. t6:=prod+t5
7. prod:=t6
8. t7:=i+1
9. i:=t7
10. if i<=20 goto (1)
+ t6,prod
prod0 * t5
[] t2 [] t4 <= (1)
* t1,t3 + t7,I 20
a b 4 i0 1
t1:=4*i
t2:=a[t1]
t4:=b[t1]
t5:=t2*t4
prod:=prod+t5
i:=i+1
if i<=20 goto (1)
Dominators: If every path from initial node to n goes through d, then node d dominates node n.
Domination information can be presented using dominator tree.
Ex.
1
5 6
9 10
2 3
5 6 7
9 10
Natural loops: Natural loops can be easily improved. Using dominator information, we can find natural
loops in a flow graph. Natural loops have following properties:
1. It has a single entry point called header. Header dominates all nodes in the loop.
2. There is at least one way to iterate the loop.
Back edge is an edge whose head dominate its tail. (if ab is an edge, b is the head and a is the tail.)
Natural loop for back edge nd is d plus the set of nodes that can reach n without going through d.
In flow graph above, back edges are 74, 107, 43, 83, 91 and corresponding natural loops are
(4,5,6,7,8,10}, {7,8,10}, {3,4,5,6,7,8,10} (for both edges 43 and 83), entire flow graph,
respectively.
Pre-Header: Many code-optimization transformations need to move statements before the header.
Preheader is a new block created for this purpose.
Reducible flow graphs: In reducible flow graph, there are no jumps into the middle of the loops from
outside. The only entry to a loop is through its header.
A flow graph is reducible if its edges can be partitioned into two disjoint groups as follows:
1. The forward edges form an acyclic graph in which every node can be reached from the initial node of
G. 2. The back edges consist only of edges whose heads dominate their tails.
Flow graph given above follows these conditions and therefore is reducible.
However following flow graph does not follow these conditions and is therefore nonreducible.
2 3
Many languages form only reducible flow graphs as long as goto’s are not used.
Data-flow equations ( Global data flow analysis): Data flow equation is of the form :
out[S]=gen[S] U ( in[S]-kill[S])
Information at the end of statement is either generated within the statement, or enters at the beginning
and is not killed as control flows through the statement.
Data flow information (Data flow analysis) can be used to find chances of constant folding. Algorithms
for code motion and induction variable elimination also use this information.
A definition of a variable x is a statement that assigns a value to x. (Reaching definition) A definition d
reaches a point p if there is a path from the point immediately following d to p, such that d is not killed
along the path.
/* d1 */ i:=m-1;
/* d2 */ j:=n;
/* d3 */ a:=u1;
do
/* d4 */ i:=i+1;
/* d5 */ j:=j-1;
if e1 then
/* d6 */ a:=u2
else
/* d7 */ i:=u3
while e2
gen[B3]={d6}
d6: a:=u2 B3 kill[B3]={d3}
gen[B4]={d7}
B4 d7: i:=u3 kill[B4]={d1,d4}
in[B2]=out[B1] U out[B3] U out[B4]
=111 0000 + 000 0010 + 000 0001 =111 0011
out[B2]=gen[B2] U (in[B2]-kill[B2])
=000 1100 + ( 111 0011 – 110 0001 ) = 001 1110
From the second pass onwards there is no change in out sets, so the algorithm terminates.
d1 i:=2
d2 j:=i+1 B1
d3 i:=1 B2
d4 j:=j+1 B3
B4 d5 j:=j-4
B5
In the flow graph above, there are three uses of names: d2 uses i, and d4 and d5 use j.
ud-chain of i in d2 is only d1.
Since the use of j at d4 in B3 is not preceded by a definition of j in B3, we have to consider in[B3] (
computed using in and out). in[B3]={d2,d3,d4,d5}. Out of these except d3, all are definitions of j, so the
ud-chain of j in d4 is d2,d4,d5.
Since the use of j at d5 of block B4 is not preceded by a definition of j in B4, we have to consider
in[B4]. in[B4]={d3,d4}. Out of these, only d4 defines j, so the ud-chain of j in d5 is only d4.
Application of ud-chains: If there is only one definition of name A which reaches a point p, and that
definition is A:=5, then we know that A has the value 5 at that point, and we can substitute 5 for A if
there is a use of A at point p.
Ex.: in[B5]={d3,d4,d5}. Out of these, only d3: i:=1 is a definition of i. Therefore, if there were a use of i
in B5 that preceded any definition of i in B5, it could be replaced by a use of the constant 1.
Code generation
Our target machine is a byte-addressable machine with four bytes per word and n general purpose
registers, R0, R1, . . . , Rn-1. It has two-address instructions of the form
Op source destination
Some commonly used instructions are:
MOV (move source to destination)
ADD (add source to destination)
SUB (subtract source from destination)
Absolute M M 1
Register R R 0
Indexed c(R) c+contents(R) 1
Indirect register *R contents(R) 0
Indirect indexed *c(R) contents(c+contents(R) 1
Instruction costs: Instruction cost is one plus the cost associated with source and destination address
mode. Instruction cost corresponds to the length of instruction because in most machines time taken to
fetch an instruction is more than time taken to execute it.
MOV b,a
ADD c,a cost=6
MOV * R1,* R0
ADD *R2 , R0 cost=2
ADD R2,R1
MOV R1,a cost=3
Ex. The assignment d:=(a-b) + (a-c) + (a-c) may be translated into the following three-address
statements:
t1:=a-b
t2=a-c
t3:=t1+t2
d:=t2+t3
( Here d is live at end.)
The code generation algorithm produces the code below for the three-address statements given above.
The code generation algorithm produces following code for indexed assignments:
Statement I in register Ri i in memory Mi i in stack
Code cost code cost Code Cost
A:=b[i] MOV b(Ri),R 2 MOV Mi,R 4 MOV Si(A),R 4
MOV b(R),R MOV b(R),R
In case of stack, we assume that i is on the stack at offset Si and the pointer to the activation record for i
is in the register A.
The code generation algorithm produces following code for pointer assignments:
Conditional Statements: We use the instruction CJ<=z which means If the condition code is negative or
zero, jump to z.
For ex. if x<y goto z can be implemented by
CMP x,y
CJ<z
Similarly, x:=y+z if x<0 goto z can be implemented by
MOV y,R0
ADD z,R0
MOV R0,x
CJ < z
Peephole optimization: This technique is simple but effective to locally improve the target code. A short
sequence of the target instructions (called the peephole) is examined and replaced by shorter or faster
sequence, wherever possible. We regard peephole optimization as a technique to improve the quality of
the target code, but the same technique can also be applied directly after intermediate code generation to
improve the intermediate representation.
The peephole is a small, moving window on the target program. The code in the peephole need not be
contiguous, but some implementations require this. Each improvement may create new opportunities for
further improvement. Therefore repeated passes over the target code may be necessary to get the
maximum benefit. Given below are the program transformations usually done in peephole optimization
technique:
Redundant-instruction elimination ( Redundant Loads and Stores):
In the following instruction sequence instruction (2) can be deleted.
(1) MOV R0,a
(2) MOV a,R0
In case if (2) had a label we won’t be sure (1) is always executed immediately before (2) and we could
not remove (2). In other words (1) and (2) should be in the same block for this transformation to be
safe.
Unreachable code: Unreachable instruction can be removed. An unlabeled instruction immediately
following an unconditional jump can be removed. For ex., given below is a C program fragment:
#define debug 0
…
if(debug) {
print debugging information
Now since debug is set to 0 at the beginning of the program (we should do a global “reaching definition”
data flow analysis to find out the definition of debug reaching if statement), constant propagation will
give us following code:
if 0 1 goto L2
print debugging information
L2:
The first line in the above code can be replaced by goto L2 , since the condition is always true. Now all
the statements which print debugging information are unreachable and can be eliminated.
Flow-of-control optimizations: The intermediate code generation algorithms frequently produce jumps
to jumps, jumps to conditional jumps, etc. These unnecessary jumps can be eliminated.
For ex. the sequence
if a<b got L1
…
L1: goto L2
can be replaced by
if a<b goto L2
…
L1: goto L2
Algebraic simplification: For ex. statements (in this specific ex. these are algebraic identities) such as
below may be produced by simple code generation algorithms:
x:=x+0
or
x:=x*1
Such statements can be easily eliminated by peephole optimization.
Reduction in strength: We can replace expensive operations by equivalent cheaper operations available
on the target machine. For ex., x2 is cheaper to implement to as x*x than as a call to exponentiation
routine, fixed-point multiplication or division by a power of two is cheaper to implement as a shift.
Use of machine idioms: The target machines may have hardware instructions which implements some
specific operations efficiently. For ex. some machines have auto-increment and auto-decrement
addressing modes. They add or subtract one from an operand before or after using its value. These
modes improves the code substantially if used when pushing or popping a stack, as required in
parameter passing.
Generating code from dags: Now we will generate code for a basic block from its dag representation.
From a dag it is easier to make out the order of the final computation sequence than from a linear
sequence of three-address statements or quadruples. When dag is a tree, we can generates code that can
be proved to be optimal under criteria such as program length or the fewest no. of temporaries used.
Ex. 1 Given below is an ex. which shows how the order of computation can affect the cost of resulting
object code.
Basic block: t1:= a+b
t2:= c+d
t3:= e-t2
t4:= t1-t3
dag:
- t4
+ t1 - t3
a0 b0 e0 + t2
c0 d0
Now, using code-generation algorithm, we get the following code sequence ( We assume that two
registers R0 and R1 are available, and only t4 is live on exit from the given block.) :
MOV a,R0
ADD b,R0
MOV c,R1
ADD d,R1
MOV R0,t1
MOV e,R0
SUB R1,R0
MOV t1,R1
SUB R0,R1
MOV R1,t4
Now, we rearrange the order of statements such that t1 is computed just before t4, as shown below:
t2:= c+d
t3:= e-t2
t1:= a+b
t4:= t1-t3
Again using code-generation algorithm, we get following code sequence. Here we have saved two
instructions: MOV R0,t1 ( stores the value of R0 in memory location t1) and MOV t1,R1 (reloads the
value of t1 in R1). :
MOV c,R0
ADD d,R0
MOV e,R1
SUB R0,R1
MOV a,R0
ADD b,R0
SUB R1,R0
MOV R0,t4
For efficient computation of t4, left argument of t4 must be in a register. Therefore, t1 i.e. left operand of
t4 was computed just before t4. That is why above reordering improved the code. Given below is an
algorithm that whenever possible makes evaluation of a node immediately follow the evaluation of its
leftmost argument. This is called heuristic ordering algorithm ( Node listing algorithm) . It gives the
ordering in reverse order.
Ex. 2
a dag:
* 1
+ 2 - 3
* 4
- 5 + 8
+ 6 c 7 d 11 e 12
a 9 b 10
Now applying the above node listing algorithm, gives use the following order: 1234568. Therefore,
reversing this list we get 8654321. Therefore, the corresponding sequence of three-address statements is:
t8:= d+e
t6:= a+b
t5:= t6-c
t4:= t5*t8
t3:= t4-e
t2:= t6+t4
t1:=t2*t3
We use an algorithm called labeling algorithm to determine the optimal order of evaluation of
statements in a basic block when dag representation of the block is tree. Optimal order gives the shortest
instruction sequence.
Labeling algorithm has two parts: The first part labels each node of the tree, bottom-up, with an integer
that denotes the fewest no. of registers required to evaluate the tree with no storage of intermediate
results. The second part is a tree traversal whose order is governed by the computed node labels. The
output code is generated during the tree traversal.
Given the two operands of a binary operator, this algorithm evaluates the operand requiring more
registers first. If both the operands require the same no. of registers, any operand can be evaluated first.
1. If n is a leaf then
2. if n is the leftmost child of its parent then
3. label(n):=1
4. else label(n):=0
else begin /* n is an interior node */
5. let n1,n2,…,nk be the children of n ordered by label,
so label(n1)>=label(n2)>=…….>=label(nk);
6. label(n):=max(label(ni)+i-1)
1<=i<=k
end
In a special important case when n is a binary node and its children have labels l1 and l2, the formula of
line 6 reduces to
dag: t4
+ t1 - t3
a0 b0 e0 + t2
c0 d0
labeled tree:
t4 2
t1 1 t3 2
e 1 t2 1
a 1 b 0 c 1 d 0
Code generation from a labeled tree: Our code generation algorithm takes a labeled tree T as input and
produces a machine code sequence that evaluates T as an output. This algorithm uses a recursive
procedure gencode(n) to produce machine code which evaluates the subtree of T with root n into a
register. The procedure gencode uses a stack rstack to allocate registers (Suppose the no. of registers
available are r) . It also uses stack tstack to allocate temporary memory locations.
procedure gencode(n);
begin
/* case 0 */
if n is a left leaf representing operand name and n is the leftmost child of its parent then
print ‘MOV’ || name || ‘ , ‘ || top(rstack)
else if n is an interior node with operator op, left child n1, and right child n2 then
/* case 1 */
if label(n2)=0 then begin
let name be the operand represented by n2;
gencode(n1);
print op || name || ‘,’ || top(rstack)
end
/* case 2 */
else if 1 <= label(n1) < label(n2) and label (n1) < r then begin
swap(rstack);
gencode(n2);
R:=pop(rstack); /* n2 was evaluated into register R */
gencode(n1);
print op || R || ‘,’ || top(rstack);
push(rstack,R);
swap(rstack)
end
/* case 3 */
else if 1<=label(n2)<=label(n1) and label(n2) < r then begin
gencode(n1);
R:=pop(rstack); /* n1 was evaluated into register R */
gencode(n2);
print op || top(rstack) || ‘,’ || R;
push(rstack,R);
end
/* case 4 ,both labels >= r , the total no. of registers */
else begin
gencode(n2);
T:= pop(tstack);
print ‘MOV’ || top(rstack) ||’,’ || T;
gencode(n1);
push(tstack,T);
print op || T || ’,’ || top(rstack)
end
end
Ex. We can generate code for the labeled tree given above. Suppose rstack= R0,R1 initially. Trace of
gencode routine is shown below. In alongside brackets contents of rstack at the time of each call is
shown, with the top at the right end.
Algebraic properties like commutativity and associativity of operators can be used to replace a given
tree T by one with smaller labels (to avoid stores in case 4 of gencode ) and/or fewer left leaves ( to
avoid loads in case 0). For ex. we may replace the tree given below by the tree which follows. This will
reduce the no. of left leave by one and possible lower some labels also. This is possible because operator
+ is commutative.
+ max(2,l)
1 l
T1
+ l
l 0
T1
Since operator + is commutative as well as associative, cluster of nodes labeled + can be replaced by left
chain which follows. To minimize the label of root we have arrange Ti1 as one having largest label out
of T1,T2,T3,T4 , and also we have to ensure that Ti1 is not a leaf unless all of T1,…,T4 are.
+
T1 +
+ T4
T2 T3
+ Ti4
+ Ti3
Ti1 Ti2