0% found this document useful (0 votes)
38 views18 pages

Unit 6

The document discusses code optimization techniques including machine-independent optimizations like common subexpression elimination, copy propagation, dead code elimination, and constant folding. It also covers machine-dependent optimizations and loop optimizations such as code motion, induction variable elimination, and reduction in strength. An example of applying these optimizations to a quicksort code fragment is provided.

Uploaded by

Yash Waghmare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views18 pages

Unit 6

The document discusses code optimization techniques including machine-independent optimizations like common subexpression elimination, copy propagation, dead code elimination, and constant folding. It also covers machine-dependent optimizations and loop optimizations such as code motion, induction variable elimination, and reduction in strength. An example of applying these optimizations to a quicksort code fragment is provided.

Uploaded by

Yash Waghmare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Unit-VI: Code Optimization

Code Generation

Code optimization
Under this topic we cover machine-independent optimizations. Machine dependent optimizations, such
as register allocation and utilization of special machine-instruction sequences (machine idioms) are
covered under next topic “ Code generation “.

The principal sources of optimization: There are some useful code-improving transformations. A
transformation is called local if it can be performed by looking only at the statements in a basic block;
otherwise, it is called global. Usually, local transformations are done first.
Function-preserving transformations: Some function-preserving transformations are: 1. Common
subexpression elimination 2. Copy propagation 3. Dead-code elimination 4. Constant folding.
1. Common subexpression elimination: An expression E is called a common subexpression if E
was previously computed, and the values of variables in E have not changed since the previous
computation.
2. Copy propagation: Assignments of the form a:=b are called copy statements or copies. Copy
statements are created due to some optimization algorithms like algorithm for common
subexpression elimination. The idea behind copy-propagation transformation is to use b for a
wherever possible after copy statement a:=b.
3. Dead-code elimination: If a value of a variable is not used after a certain point, then it is dead at
that point. Similarly, dead code or useless code is the statements computing values, which never
gets used. Copy propagation often turns the copy statement into dead code.
4. Constant folding: If the value of an expression is constant at compile time, then constant can be
used instead of an expression. This is called constant folding.

Loop optimizations: Since programs spend most of their time in loop, especially inner loops, loops are
very important places for optimization. Some important loop optimization techniques are: 1. Code
motion 2. Induction-variable elimination 3. Reduction in strength.
1. Code motion: Loop-invariant computation (expression whose value does not change during loop
iterations) can be moved before the loop. This decreases the amount of code inside a loop.
2. Induction-variable elimination: If a:=b*4 is an assignment then every time if b increases by 1, a
increases by 4. Here a and b are induction variables. When there are two or more induction
variables in a loop, it may be possible to eliminate all except one.
3. Reduction in strength: Here a cheaper one replaces expensive operation. For ex. addition may
replace multiplication.

Ex. C code for quicksort :


void quicksort(m,n)
int m,n;
{
int i,j; int v,z; if(n<=m) return;
/* fragment begins here */
i=m-1; j=n; v=a[n];
while(1) {
do i=i+1; while (a[i]<v); do j=j-1; while (a[j]>v); if(i>=j) break;
x=a[i]; a[i]=a[j]; a[j]=x; } x=a[i]; a[i]=a[n]; a[n]=x;
/* fragment ends here */ quicksort(m,j); quicksort(i+1,n);
}
Three-address code for fragment above:
1. i:=m-1 2.j:=n 3. t1:=4*n 4. v:=a[t1] 5. i:=i+1 6. t2:=4*i
7.t3:=a[t2] 8. if t3<v goto (5) 9. j:=j-1 10.t4:=4*j 11. t5:=a[t4]
12. if t5>v goto (9) 13. if i>=j goto23 14. t6:=4*i 15. x:=a[t6] 16. t7:=4*i
17. t8:=4*j 18. t9:=a[t8] 19. a[t7]:=t9 20. t10:=4*j 21. a[t10]:=x 22. goto (5)
23. t11:=4*i 24. x:=a[t11] 25. t12:=4*i 26. t13:=4*n 27. t14:=a[t13]
28. a[t12]:=t14 29. t15:=4*n 30. a[t15]:=x

Basic block: Basic block is a sequence of consecutive statements in which flow of control enters at
the beginning and leaves at the end without halt or branching except at the end.
Flow graph: B1
i:=m-1
j:=n
t1:=4*n
v:=a[t1]

B2
i:=i+1
t2:=4*i
t3:=a[t2]
If t3<v goto B2

B3
j:=j-1
t4:=4*j
t5:=a[t4]
If t5>v goto B3
B4

If i>=j goto B6

B5 B6
t6:=4*i t11:=4*i
x:=a[t6] x:=a[t11]
t7:=4*i t12:=4*i
t8:=4*j t13:=4*n
t9:=a[t8] t14:=a[t13]
a[t7]:=t9 a[t12]:=t14
t10:=4*j t15:=4*n
a[t10]:=x a[t15]:=x
goto B2

Common subexpression elimination: After removing common subexpression 4*i and 4*j, block B5
becomes
t6:=4*I
x:=a[t6]
t8:=4*j
t9:=a[t8]
a[t6]:=t9
a[t8]:=x
goto B2
Now, using global common subexpression elimination 4*i and 4*j can be replaced by t2 and t4,
respectively. Therefore, B5 becomes:

x:=a[t2]
t9:=a[t4]
a[t2]:=t9
a[t4]:=x
goto B2

Now a[t2] and a[t4] can be replaced by t3 and t5, respectively. Finally, B5 becomes:

x:=t3
t9:=t5
a[t2]:=t9
a[t4]:=x
goto B2

By similar transformations B6 becomes:

L.C.S.E. G.C.S.E G.C.S.E.

t11:=4*i x:=a[t2] x:=t3


x:=a[t11] t14:=a[t1] t14:=a[t1]
t13:=4*n a[t2]:=t14 a[t2]:=t14
t14:=a[t13] a[t1]:=x a[t1]:=x
a[t12]:=t14
a[t13]:=x

After copy propagation transformation B5 becomes:

x:=t3
t9:=t5
a[t2]:=t5
a[t4]:=t3
goto B2

After dead-code elimination B5 becomes:

A[t2]:=t5
A[t4]:=t3
Goto B2

After copy propogation transformation B6 becomes:

x:=t3
t14:=a[t1]
a[t2]:=t14
a[t1]:=t3
After dead-code elimination B6 becomes:

t14:=a[t1]
a[t2]:=t14
a[t1]:=t3

Code motion is not applicable to the quicksort ex. given above.

In a loop consisting of B3 itself, j and t4 are induction variables. By applying, induction variable
elimination and reduction in strength, we replace assignment t4:=4*j by t4:=t4-4. We place an
initialization of t4 at the end of block where j itself is initialized. (We add t4:=4*j at the end of block
B1.) Similar transformation is done in block B2 also.
Now the only use of i and j is in a test in block B4, which can be replaced by t2>=t4. Now i and j
become dead variables and the code assigning values to them is also dead code.
Final flow graph:

i:=m-1
j:=n B1
t1:=4*n
v:=a[t1]
t2:=4*i
t4:=4*j

B2

t2:=t2+4
t3:=a[t2]
If t3<v goto B2

B3
t4:=t4-4
t5:=a[t4]
If t5>v goto B3

B4

t2>=t4 goto B6

B5 B6

a[t2]:=t5
a[t4]:=t3 t14:=a[t1]
goto B2 a[t2]:=t14
a[t1]:=t3

The dag representation of basic blocks: Directed acyclic graphs (dags) are useful data structures for
implementing transformation on basic blocks. Using dag, we can determine common subexpressions
within a block, determine names which are used inside the block but evaluated outside the block, and
determine which statements of the block could have their computed value used outside the block.
A dag for a basic block is directed acyclic graph with the following labels on nodes:
1. Leaves are labeled by unique identifiers, which are either variable names or constants.
2. Interior nodes are labeled by an operator symbol
3. Nodes are also optionally given a sequence of identifiers for labels.

Ex.
1. t1:=4*i
2. t2:=a[t1]
3. t3:=4*i
4. t4:=b[t3]
5. t5:=t2*t4
6. t6:=prod+t5
7. prod:=t6
8. t7:=i+1
9. i:=t7
10. if i<=20 goto (1)

Given above is a basic block. dag for this block is as follows:

+ t6,prod

prod0 * t5

[] t2 [] t4 <= (1)

* t1,t3 + t7,I 20

a b 4 i0 1

Application of dags: 1. We can automatically detect common subexpressions. 2. We can determine


which identifier values are used in the block. They are those for which a leaf is created at some time. 3.
We can determine which statements compute the values that can be used outside the block. 4. We can
reconstruct a simplified list of quadruples taking advantage of common subexpressions and avoiding
copies like x:=y unless absolutely necessary. Assuming that no temporary is needed outside the block,
simplified list of quadruples is as follows:

t1:=4*i
t2:=a[t1]
t4:=b[t1]
t5:=t2*t4
prod:=prod+t5
i:=i+1
if i<=20 goto (1)

Loops in flow graphs:

Dominators: If every path from initial node to n goes through d, then node d dominates node n.
Domination information can be presented using dominator tree.
Ex.
1

5 6

9 10

Fig.: Flow graph


1

2 3

5 6 7

9 10

Fig.: Dominator tree for above flow graph

Natural loops: Natural loops can be easily improved. Using dominator information, we can find natural
loops in a flow graph. Natural loops have following properties:
1. It has a single entry point called header. Header dominates all nodes in the loop.
2. There is at least one way to iterate the loop.

Back edge is an edge whose head dominate its tail. (if ab is an edge, b is the head and a is the tail.)
Natural loop for back edge nd is d plus the set of nodes that can reach n without going through d.
In flow graph above, back edges are 74, 107, 43, 83, 91 and corresponding natural loops are
(4,5,6,7,8,10}, {7,8,10}, {3,4,5,6,7,8,10} (for both edges 43 and 83), entire flow graph,
respectively.

Pre-Header: Many code-optimization transformations need to move statements before the header.
Preheader is a new block created for this purpose.

Reducible flow graphs: In reducible flow graph, there are no jumps into the middle of the loops from
outside. The only entry to a loop is through its header.
A flow graph is reducible if its edges can be partitioned into two disjoint groups as follows:
1. The forward edges form an acyclic graph in which every node can be reached from the initial node of
G. 2. The back edges consist only of edges whose heads dominate their tails.

Flow graph given above follows these conditions and therefore is reducible.
However following flow graph does not follow these conditions and is therefore nonreducible.

2 3
Many languages form only reducible flow graphs as long as goto’s are not used.

Data-flow equations ( Global data flow analysis): Data flow equation is of the form :
out[S]=gen[S] U ( in[S]-kill[S])
Information at the end of statement is either generated within the statement, or enters at the beginning
and is not killed as control flows through the statement.
Data flow information (Data flow analysis) can be used to find chances of constant folding. Algorithms
for code motion and induction variable elimination also use this information.
A definition of a variable x is a statement that assigns a value to x. (Reaching definition) A definition d
reaches a point p if there is a path from the point immediately following d to p, such that d is not killed
along the path.

Iterative solution of data flow equations:


Program for illustrating reaching definitions:

/* d1 */ i:=m-1;
/* d2 */ j:=n;
/* d3 */ a:=u1;
do
/* d4 */ i:=i+1;
/* d5 */ j:=j-1;
if e1 then
/* d6 */ a:=u2
else
/* d7 */ i:=u3
while e2

Flow graph for illustrating reaching definitions:

d1: i:=m-1 gen[B1]={d1,d2,d3}


B1 d2: j:=n kill[B1]={d4,d5,d6,d7}
d3: a:=u1

d4: i:=i+1 gen[B2]={d4,d5}


B2 d5: j:=j-1 kill[B2]:={d1,d2,d7}

gen[B3]={d6}
d6: a:=u2 B3 kill[B3]={d3}

gen[B4]={d7}
B4 d7: i:=u3 kill[B4]={d1,d4}
in[B2]=out[B1] U out[B3] U out[B4]
=111 0000 + 000 0010 + 000 0001 =111 0011
out[B2]=gen[B2] U (in[B2]-kill[B2])
=000 1100 + ( 111 0011 – 110 0001 ) = 001 1110

Computation of in and out:

Block B Initial Pass 1 Pass2


In[B} Out[B] In[B] Out[B] In[B] Out[B]
B1 000 0000 111 0000 000 0000 111 0000 000 0000 111 0000
B2 000 0000 000 1100 111 0011 001 1110 111 1111 001 1110
B3 000 0000 000 0010 001 1110 000 1110 001 1110 000 1110
B4 000 0000 000 0001 001 0111 001 0111 001 1110 001 0111

From the second pass onwards there is no change in out sets, so the algorithm terminates.

Computation of ud-chains: ud-chains(use-definition) chains are computed from reaching defintion


information.
Flow graph:

d1 i:=2
d2 j:=i+1 B1

d3 i:=1 B2

d4 j:=j+1 B3

B4 d5 j:=j-4

B5

In the flow graph above, there are three uses of names: d2 uses i, and d4 and d5 use j.
ud-chain of i in d2 is only d1.
Since the use of j at d4 in B3 is not preceded by a definition of j in B3, we have to consider in[B3] (
computed using in and out). in[B3]={d2,d3,d4,d5}. Out of these except d3, all are definitions of j, so the
ud-chain of j in d4 is d2,d4,d5.
Since the use of j at d5 of block B4 is not preceded by a definition of j in B4, we have to consider
in[B4]. in[B4]={d3,d4}. Out of these, only d4 defines j, so the ud-chain of j in d5 is only d4.
Application of ud-chains: If there is only one definition of name A which reaches a point p, and that
definition is A:=5, then we know that A has the value 5 at that point, and we can substitute 5 for A if
there is a use of A at point p.
Ex.: in[B5]={d3,d4,d5}. Out of these, only d3: i:=1 is a definition of i. Therefore, if there were a use of i
in B5 that preceded any definition of i in B5, it could be replaced by a use of the constant 1.

Code generation
Our target machine is a byte-addressable machine with four bytes per word and n general purpose
registers, R0, R1, . . . , Rn-1. It has two-address instructions of the form
Op source destination
Some commonly used instructions are:
MOV (move source to destination)
ADD (add source to destination)
SUB (subtract source from destination)

Various address modes:

Mode Form Address Added cost

Absolute M M 1
Register R R 0
Indexed c(R) c+contents(R) 1
Indirect register *R contents(R) 0
Indirect indexed *c(R) contents(c+contents(R) 1

Instruction costs: Instruction cost is one plus the cost associated with source and destination address
mode. Instruction cost corresponds to the length of instruction because in most machines time taken to
fetch an instruction is more than time taken to execute it.

Ex. MOV b,R0


ADD c,R0 cost=6
MOV R0,a

MOV b,a
ADD c,a cost=6

MOV * R1,* R0
ADD *R2 , R0 cost=2

ADD R2,R1
MOV R1,a cost=3

A Simple code generator:

A code-generation algorithm: Input to code-generation algorithm is a sequence of three-address


statements of a basic block. Following actions are performed for each three-address statement of the
form x := y op z:
1. Call a function getreg to determine the location L where the result of the computation y op z is to be
stored. L is usually a register, but it could also be a memory loacation.
2. Consult the address descriptor for y to determine y’, (one of) the current location(s) of y. Prefer the
register for y’ if the value of y is currently both in memory and a register. If the value of y is not already
in L, generate the instruction MOV y’,L to place a copy of y in L.
3.Generate the instruction OP z’,L where z’ is a current location of z. Again prefer a register to a
memory location if z is in both. Update the address descriptor of x to indicate that x is in location L. If L
is a register, update its descriptor to indicate that it contains the value of x, and remover x from all other
register descriptors.
4. If the current value of y and/or z are not live on exit from the block, and are in registers, modify the
register descriptor to indicate that after execution of x:=y op z , those registers no longer contain y
and/or z.
In a special case where three-address statement is a copy statement like x := y. If y is in a register, just
change the register and address descriptor to indicate that the value of x is not found only in the register
holding the value of y.
The function getreg: This function returns the location L to hold the value of x for the assignment x := y
op z. This function uses the previously collected next-use information.
1. If the name y is in a register that holds the value of no other names ( copy statement may a register to
hold the value of two or more variables simultaneously), and y is not live and has no next use after
execution of x := y op z, then return the register of y for L. Update the address descriptor of y to indicate
that y is no longer in L.
2. If above step fails, then return an empty register for L if available.
3. IF step 2 also fails, do as follows: If x has a next use in the block, op is an operator, such as indexing,
that requires a register, find an occupied register R. Store the value of R into a memory location( by
MOV R,M) if it is not already in a proper memory location M, update the address descriptor for M, and
return R. If R holds the value of several variables, a MOV instruction should be generated for each
variable to be stored. A suitable occupied register is one whose value is not required in near future, or
whose value is also in memory.
4. If x is not used in the block, or no suitable occupied register is found, select the memory location of x
as L.

Ex. The assignment d:=(a-b) + (a-c) + (a-c) may be translated into the following three-address
statements:
t1:=a-b
t2=a-c
t3:=t1+t2
d:=t2+t3
( Here d is live at end.)

The code generation algorithm produces the code below for the three-address statements given above.

Statements Code generated Register descriptor Address descriptor


Registers empty

t1:=a-b MOV a,R0 R0 contains t1 t1 in R0


SUB b,R0

t2:=a-c MOV a,R1 R0 contains t1 t1 in R0


SUB c,R1 R1 contains t2 t2 in R1

t3:=t1+t2 ADD R1,R0 R0 contains t3 t2 in R1


R1 contains t2 t3 in R0

d:=t2+t3 ADD R1,R0 R0 contains d d in R0


MOV R0,d d in R0 and memory

The code generation algorithm produces following code for indexed assignments:
Statement I in register Ri i in memory Mi i in stack
Code cost code cost Code Cost
A:=b[i] MOV b(Ri),R 2 MOV Mi,R 4 MOV Si(A),R 4
MOV b(R),R MOV b(R),R

A[I]:=b MOV b,a(Ri) 3 MOV Mi,R 5 MOV Si(A),R 5


MOV b,a(R) MOV b,a(R)

In case of stack, we assume that i is on the stack at offset Si and the pointer to the activation record for i
is in the register A.

The code generation algorithm produces following code for pointer assignments:

Statement P in register Rp P in memory Mp P in stack


Code Cost Code Cost Code Cost
A:=*p MOV *Rp,a 2 MOV Mp,R 3 MOV Sp(A),R 3
MOV *R,R MOV *R,R

*p:=a MOV a,*Rp 2 MOV Mp,R 4 MOV a,R 4


MOV a,*R MOV R,*Sp(A)

Conditional Statements: We use the instruction CJ<=z which means If the condition code is negative or
zero, jump to z.
For ex. if x<y goto z can be implemented by
CMP x,y
CJ<z
Similarly, x:=y+z if x<0 goto z can be implemented by

MOV y,R0
ADD z,R0
MOV R0,x
CJ < z
Peephole optimization: This technique is simple but effective to locally improve the target code. A short
sequence of the target instructions (called the peephole) is examined and replaced by shorter or faster
sequence, wherever possible. We regard peephole optimization as a technique to improve the quality of
the target code, but the same technique can also be applied directly after intermediate code generation to
improve the intermediate representation.
The peephole is a small, moving window on the target program. The code in the peephole need not be
contiguous, but some implementations require this. Each improvement may create new opportunities for
further improvement. Therefore repeated passes over the target code may be necessary to get the
maximum benefit. Given below are the program transformations usually done in peephole optimization
technique:
Redundant-instruction elimination ( Redundant Loads and Stores):
In the following instruction sequence instruction (2) can be deleted.
(1) MOV R0,a
(2) MOV a,R0
In case if (2) had a label we won’t be sure (1) is always executed immediately before (2) and we could
not remove (2). In other words (1) and (2) should be in the same block for this transformation to be
safe.
Unreachable code: Unreachable instruction can be removed. An unlabeled instruction immediately
following an unconditional jump can be removed. For ex., given below is a C program fragment:
#define debug 0

if(debug) {
print debugging information

An intermediate representation of the above program fragment may be :


if debug = goto L1
goto L2
L1: print debugging information
L2:
One obvious peephole optimization is to eliminate jumps over jumps. Therefore the resulting code is:
if debug  1 goto L2
print debugging information
L2:

Now since debug is set to 0 at the beginning of the program (we should do a global “reaching definition”
data flow analysis to find out the definition of debug reaching if statement), constant propagation will
give us following code:

if 0  1 goto L2
print debugging information
L2:

The first line in the above code can be replaced by goto L2 , since the condition is always true. Now all
the statements which print debugging information are unreachable and can be eliminated.

Flow-of-control optimizations: The intermediate code generation algorithms frequently produce jumps
to jumps, jumps to conditional jumps, etc. These unnecessary jumps can be eliminated.
For ex. the sequence
if a<b got L1

L1: goto L2

can be replaced by
if a<b goto L2

L1: goto L2

Algebraic simplification: For ex. statements (in this specific ex. these are algebraic identities) such as
below may be produced by simple code generation algorithms:
x:=x+0
or
x:=x*1
Such statements can be easily eliminated by peephole optimization.

Reduction in strength: We can replace expensive operations by equivalent cheaper operations available
on the target machine. For ex., x2 is cheaper to implement to as x*x than as a call to exponentiation
routine, fixed-point multiplication or division by a power of two is cheaper to implement as a shift.

Use of machine idioms: The target machines may have hardware instructions which implements some
specific operations efficiently. For ex. some machines have auto-increment and auto-decrement
addressing modes. They add or subtract one from an operand before or after using its value. These
modes improves the code substantially if used when pushing or popping a stack, as required in
parameter passing.

Generating code from dags: Now we will generate code for a basic block from its dag representation.
From a dag it is easier to make out the order of the final computation sequence than from a linear
sequence of three-address statements or quadruples. When dag is a tree, we can generates code that can
be proved to be optimal under criteria such as program length or the fewest no. of temporaries used.
Ex. 1 Given below is an ex. which shows how the order of computation can affect the cost of resulting
object code.
Basic block: t1:= a+b
t2:= c+d
t3:= e-t2
t4:= t1-t3
dag:
- t4

+ t1 - t3

a0 b0 e0 + t2

c0 d0

Now, using code-generation algorithm, we get the following code sequence ( We assume that two
registers R0 and R1 are available, and only t4 is live on exit from the given block.) :

MOV a,R0
ADD b,R0
MOV c,R1
ADD d,R1
MOV R0,t1
MOV e,R0
SUB R1,R0
MOV t1,R1
SUB R0,R1
MOV R1,t4

Now, we rearrange the order of statements such that t1 is computed just before t4, as shown below:

t2:= c+d
t3:= e-t2
t1:= a+b
t4:= t1-t3

Again using code-generation algorithm, we get following code sequence. Here we have saved two
instructions: MOV R0,t1 ( stores the value of R0 in memory location t1) and MOV t1,R1 (reloads the
value of t1 in R1). :

MOV c,R0
ADD d,R0
MOV e,R1
SUB R0,R1
MOV a,R0
ADD b,R0
SUB R1,R0
MOV R0,t4

For efficient computation of t4, left argument of t4 must be in a register. Therefore, t1 i.e. left operand of
t4 was computed just before t4. That is why above reordering improved the code. Given below is an
algorithm that whenever possible makes evaluation of a node immediately follow the evaluation of its
leftmost argument. This is called heuristic ordering algorithm ( Node listing algorithm) . It gives the
ordering in reverse order.

while unlisted interior nodes remain do begin


select an unlisted node n, all of whose parents have
been listed;
list n;
while the leftmost child m of n has no unlisted parents
and is not a leaf do
/* since n was just listed, m is not yet listed */
begin
list m;
n:= m
end
end

Ex. 2

a dag:
* 1

+ 2 - 3

* 4

- 5 + 8

+ 6 c 7 d 11 e 12

a 9 b 10

Now applying the above node listing algorithm, gives use the following order: 1234568. Therefore,
reversing this list we get 8654321. Therefore, the corresponding sequence of three-address statements is:
t8:= d+e
t6:= a+b
t5:= t6-c
t4:= t5*t8
t3:= t4-e
t2:= t6+t4
t1:=t2*t3
We use an algorithm called labeling algorithm to determine the optimal order of evaluation of
statements in a basic block when dag representation of the block is tree. Optimal order gives the shortest
instruction sequence.
Labeling algorithm has two parts: The first part labels each node of the tree, bottom-up, with an integer
that denotes the fewest no. of registers required to evaluate the tree with no storage of intermediate
results. The second part is a tree traversal whose order is governed by the computed node labels. The
output code is generated during the tree traversal.
Given the two operands of a binary operator, this algorithm evaluates the operand requiring more
registers first. If both the operands require the same no. of registers, any operand can be evaluated first.

Postorder is always a proper order to do the label computations.

Labeling algorithm (first part: label computation):

1. If n is a leaf then
2. if n is the leftmost child of its parent then
3. label(n):=1
4. else label(n):=0
else begin /* n is an interior node */
5. let n1,n2,…,nk be the children of n ordered by label,
so label(n1)>=label(n2)>=…….>=label(nk);
6. label(n):=max(label(ni)+i-1)
1<=i<=k
end

In a special important case when n is a binary node and its children have labels l1 and l2, the formula of
line 6 reduces to

Label(n)= max(l1,l2) if l1/=l2


= l1+1 if l1=l2

Ex. Three-address code:


t1:=a+b
t2:=c+d
t3:=e-t2
t4:=t1-t3

dag: t4

+ t1 - t3

a0 b0 e0 + t2

c0 d0
labeled tree:
t4 2
t1 1 t3 2

e 1 t2 1

a 1 b 0 c 1 d 0

Therefore, two registers are needed to evaluate t4 (also for t3).

Code generation from a labeled tree: Our code generation algorithm takes a labeled tree T as input and
produces a machine code sequence that evaluates T as an output. This algorithm uses a recursive
procedure gencode(n) to produce machine code which evaluates the subtree of T with root n into a
register. The procedure gencode uses a stack rstack to allocate registers (Suppose the no. of registers
available are r) . It also uses stack tstack to allocate temporary memory locations.

procedure gencode(n);
begin
/* case 0 */
if n is a left leaf representing operand name and n is the leftmost child of its parent then
print ‘MOV’ || name || ‘ , ‘ || top(rstack)
else if n is an interior node with operator op, left child n1, and right child n2 then
/* case 1 */
if label(n2)=0 then begin
let name be the operand represented by n2;
gencode(n1);
print op || name || ‘,’ || top(rstack)
end
/* case 2 */
else if 1 <= label(n1) < label(n2) and label (n1) < r then begin
swap(rstack);
gencode(n2);
R:=pop(rstack); /* n2 was evaluated into register R */
gencode(n1);
print op || R || ‘,’ || top(rstack);
push(rstack,R);
swap(rstack)
end
/* case 3 */
else if 1<=label(n2)<=label(n1) and label(n2) < r then begin
gencode(n1);
R:=pop(rstack); /* n1 was evaluated into register R */
gencode(n2);
print op || top(rstack) || ‘,’ || R;
push(rstack,R);
end
/* case 4 ,both labels >= r , the total no. of registers */
else begin
gencode(n2);
T:= pop(tstack);
print ‘MOV’ || top(rstack) ||’,’ || T;
gencode(n1);
push(tstack,T);
print op || T || ’,’ || top(rstack)
end
end

Ex. We can generate code for the labeled tree given above. Suppose rstack= R0,R1 initially. Trace of
gencode routine is shown below. In alongside brackets contents of rstack at the time of each call is
shown, with the top at the right end.

gencode(t4) [R1R0] /* case 2 */


gencode(t3) [R0R1] /* case 3 */
gencode(e) [R0R1] /* case 0 */
print MOV e,R1
gencode(t2) [R0] /* case 1 */
gencode(c) [R0] /* case 0 */
print MOV c,R0
print ADD d,R0
print SUB R0,R1
gencode(t1) [R0] /* case 1 */
gencode(a) [R0] /* case 0 */
print MOV a,R0
print ADD b,R0
print SUB R1,R0

Algebraic properties like commutativity and associativity of operators can be used to replace a given
tree T by one with smaller labels (to avoid stores in case 4 of gencode ) and/or fewer left leaves ( to
avoid loads in case 0). For ex. we may replace the tree given below by the tree which follows. This will
reduce the no. of left leave by one and possible lower some labels also. This is possible because operator
+ is commutative.

+ max(2,l)

1 l

T1

+ l

l 0

T1

Since operator + is commutative as well as associative, cluster of nodes labeled + can be replaced by left
chain which follows. To minimize the label of root we have arrange Ti1 as one having largest label out
of T1,T2,T3,T4 , and also we have to ensure that Ti1 is not a leaf unless all of T1,…,T4 are.
+

T1 +

+ T4

T2 T3

+ Ti4

+ Ti3

Ti1 Ti2

You might also like