0% found this document useful (0 votes)
3 views17 pages

Unit 5

The document discusses principal sources of optimization in compilers, focusing on function-preserving transformations and loop optimizations. Key techniques include common sub-expression elimination, copy propagation, dead-code elimination, constant folding, and various loop optimizations like code motion and induction variable elimination. Additionally, it introduces data flow analysis, its properties, and its role in optimizing code by identifying dependencies, detecting dead code, and performing constant propagation.

Uploaded by

heyjala777
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views17 pages

Unit 5

The document discusses principal sources of optimization in compilers, focusing on function-preserving transformations and loop optimizations. Key techniques include common sub-expression elimination, copy propagation, dead-code elimination, constant folding, and various loop optimizations like code motion and induction variable elimination. Additionally, it introduces data flow analysis, its properties, and its role in optimizing code by identifying dependencies, detecting dead code, and performing constant propagation.

Uploaded by

heyjala777
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Unit-5

PRINCIPAL SOURCES OF OPTIMIZATION


 A compiler optimization must preserve the semantics of the original program.
 There are two major transformation techniques are used for optimization of a program.
They are
1) Function-Preserving Transformations
2) Loop Optimizations
1) Function-Preserving Transformations:
• There are a number of ways in which a compiler can improve a program without
changing the function it computes.
• The transformations

i) Common sub expression elimination,


ii) Copy propagation
iii) Dead-code elimination and
iv) Constant folding
i ) Common sub expression elimination :
 An occurrence of an expression E is called a common sub-expression if E was previously
computed, and the values of variables in E have not changed since the previous
computation.
 We can avoid recomputing the expression if we can use the previously computed value.
 For example
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t4: = 4*i
t5: = n
t6: = b [t4] +t5
The code can be optimized using the common sub-expression elimination as
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t5: = n
t6: = b [t1] +t5

The common sub expression t4: =4*i is eliminated as its computation is already in t1. And
value of i is not been changed from definition to use.
ii) Copy propagation
 Assignments of the form f : = g called copy statements, or copies for short. The idea
behind the copy-propagation transformation is to use g for f, whenever possible after the
copy statement f: = g. Copy propagation means use of one variable instead of another.
This may not appear to be an improvement, but as we shall see it gives us an opportunity
to eliminate x.
 For example:
x=Pi;
…… A=x*r*r;
The optimization using copy propagation can be done as follows: A=Pi*r*r;
Here the variable x is eliminated
iii) Dead-Code Eliminations:
 A variable is live at a point in a program if its value can be used subsequently; otherwise,
it is dead at that point. A related idea is dead or useless code, statements that compute
values that never get used.
 While the programmer is unlikely to introduce any dead code intentionally, it may appear
as the result of previous transformations.
 An optimization can be done by eliminating dead code.
Example:
i=0;
if(i==1)
{
a=b+5;
}
Here, ‘if’ statement is dead code because this condition will never get satisfied.
iv) Constant folding:
Deducing at compile time that the value of an expression is a constant and using the
constant instead is known as constant folding.
 For example,
a=3.14157/2 can be replaced by
a=1.570 there by eliminating a division operation.
2) Loop Optimizations:
In loops, especially the inner loops where programs tend to spend the bulk of their time.
The running time of a program may be improved if we decrease the number of
instructions in an inner loop, even if we increase the amount of code outside that loop.
Three techniques are important for loop optimization:
i) code motion:which moves code outside a loop
ii) Induction-variable elimination: which we apply to replace variables from inner loop.
iii) Reduction in strength:which replaces and expensive operation by a cheaper one, such
as a multiplication by an addition.
i) code motion
An important modification that decreases the amount of code in a loop is code motion.
This transformation takes an expression that yields the same result independent of the
number of times a loop is executed ( a loop-invariant computation) and places the
expression before the loop.
 For example:
while (i <= limit-2) /* statement does not change limit*/ Code motion will result in
the equivalent of

t= limit-2;
while (i<=t) /* statement does not change limit or t */
ii) Induction Variables Elimination

Induction variable elimination is used to replace variable from inner loop.

It can reduce the number of additions in a loop. It improves both code space and run time
performance.
In this figure, we can replace the assignment t4:=4*j by t4:=t4-4. The only problem which
will be arose that t4 does not have a value when we enter block B2 for the first time. So we
place a relation t4=4*j on entry to the block B2.

iii) Reduction In Strength:


o Strength reduction is used to replace the expensive operation by the cheaper once on
the target machine.
o Addition of a constant is cheaper than a multiplication. So we can replace
multiplication with an addition within the loop.
o Multiplication is cheaper than exponentiation. So we can replace exponentiation with
multiplication within the loop.

while (i<10)
{
j= 3 * i+1;
a[j]=a[j]-2;
i=i+2;
}
After strength reduction the code will be:

s= 3*i+1;
while (i<10)
{
j=s;
a[j]= a[j]-2;
i=i+2;
s=s+6;
}

INTRODUCTION TO DATA FLOW ANALYSIS:


 It is the analysis of flow of data in control flow graph, i.e., the analysis that determines
the information regarding the definition and use of data in program.
 With the help of this analysis, optimization can be done. In general, its process in which
values are computed using data flow analysis. The data flow property represents
information that can be used for optimization.
 Data flow analysis is a technique used in compiler design to analyze how data flows
through a program. It involves tracking the values of variables and expressions as they
are computed and used throughout the program, with the goal of identifying
opportunities for optimization and identifying potential errors.
 The basic idea behind data flow analysis is to model the program as a graph, where the
nodes represent program statements and the edges represent data flow dependencies
between the statements. The data flow information is then propagated through the graph,
using a set of rules and equations to compute the values of variables and expressions at
each point in the program.
Basic Terminologies –
 Definition Point: a point in a program containing some definition.
 Reference Point: a point in a program containing a reference to a data item.
 Evaluation Point: a point in a program containing evaluation of expression.
Data Flow Properties –
i) Available Expression – A expression is said to be available at a program point x if
along paths its reaching to x. A Expression is available at its evaluation point.
An expression a+b is said to be available if none of the operands gets modified before
their use.

Example –

• Advantage –
It is used to eliminate common sub expressions.
ii) Reaching Definition – A definition D is reaches a point x if there is path from D to
x in which D is not killed, i.e., not redefined.
Example –

Advantage –
It is used in constant and variable propagation.
iii) Live variable – A variable is said to be live at some point p if from p to end the
variable is used before it is redefined else it becomes dead.
Example –

Advantage –
1. It is useful for register allocation.
2. It is used in dead code elimination.
iv) Busy Expression – An expression is busy along a path if its evaluation exists along
that path and none of its operand definition exists before its evaluation along the path.
Advantage –
It is used for performing code movement optimization
Features of Data flow analysis :
Identifying dependencies: Data flow analysis can identify dependencies between
different parts of a program, such as variables that are read or modified by multiple
statements.
Detecting dead code: By tracking how variables are used, data flow analysis can detect
code that is never executed, such as statements that assign values to variables that are
never used.
Optimizing code: Data flow analysis can be used to optimize code by identifying
opportunities for common subexpression elimination, constant folding, and other
optimization techniques.
Detecting errors: Data flow analysis can detect errors in a program, such as
uninitialized variables, by tracking how variables are used throughout the program.
Handling complex control flow: Data flow analysis can handle complex control flow
structures, such as loops and conditionals, by tracking how data is used within those
structures.
Interprocedural analysis: Data flow analysis can be performed across multiple
functions in a program, allowing it to analyze how data flows between different parts of
the program.
Scalability: Data flow analysis can be scaled to large programs, allowing it to analyze
programs with many thousands or even millions of lines of code.

Foundations of Data-Flow Analysis:


 In order to do code optimization and a good job of code generation , compiler needs to
collect information about the program as a whole and to distribute this information to
each block in the flow graph.

 A compiler could take advantage of “reaching definitions” , such as knowing where a


variable like debug was last defined before reaching a given block, in order to perform
transformations are just a few examples of data-flow information that an optimizing
compiler collects by a process known as data-flow analysis.

 Data-flow information can be collected by setting up and solving systems of equations of


the form :

out [S] = gen [S] U ( in [S] – kill [S] )

This equation can be read as “ the information at the end of a statement is either generated
within the statement , or enters at the beginning and is not killed as control flows through
the statement.”

 The details of how data-flow equations are set and solved depend on three factors.

 The notions of generating and killing depend on the desired information, i.e., on the data
flow analysis problem to be solved. Moreover, for some problems, instead of proceeding
along with flow of control and defining out[s] in terms of in[s], we need to proceed
backwards and define in[s] in terms of out[s].

 Since data flows along control paths, data-flow analysis is affected by the constructs in a
program. In fact, when we write out[s] we implicitly assume that there is unique end point
where control leaves the statement; in general, equations are set up at the level of basic
blocks rather than statements, because blocks do have unique end points.

 There are subtleties that go along with such statements as procedure calls, assignments
through pointer variables, and even assignments to array variables.

Points and Paths:

 Within a basic block, we talk of the point between two adjacent statements, as well as the
point before the first statement and after the last. Thus, block B1 has four points: one
before any of the assignments and one after each of the three assignments.
B1

d1 : i :=m-1

d2: j :=n
d3: a := u1
B2
d4 : I := i+1
B3
d5: j := j-1

B4

B5 B6

d6 :a :=u2

 Now let us take a global view and consider all the points in all the blocks. A path from p1
to pn is a sequence of points p1, p2,….,pn such that for each i between 1 and n-1, either

 Pi is the point immediately preceding a statement and pi+1 is the point immediately
following that statement in the same block, or

 Pi is the end of some block and pi+1 is the beginning of a successor block.

Reaching definitions:

 A definition of variable x is a statement that assigns, or may assign, a value to x. The


most common forms of definition are assignments to x and statements that read a value
from an i/o device and store it in x.

 These statements certainly define a value for x, and they are referred to as unambiguous
definitions of x. There are certain kinds of statements that may define a value for x; they
are called ambiguous definitions. The most usual forms of ambiguous definitions of x
are:

 A call of a procedure with x as a parameter or a procedure that can access x because x is


in the scope of the procedure.

 An assignment through a pointer that could refer to x. For example, the assignment *q: =
y is a definition of x if it is possible that q points to x. we must assume that an assignment
through a pointer is a definition of every variable.
 We say a definition d reaches a point p if there is a path from the point immediately
following d to p, such that d is not “killed” along that path. Thus a point can be reached

by an unambiguous definition and an ambiguous definition of the same variable appearing later along
one path

Constant Propagation:
Constant Propagation is one of the local code optimization technique in Compiler
Design. It can be defined as the process of replacing the constant value of variables in
the expression. In simpler words, we can say that if some value is assigned a known
constant, than we can simply replace the that value by constant. Constants assigned to a
variable can be propagated through the flow graph and can be replaced when the
variable is used.
Constant propagation is executed using reaching definition analysis results in compilers,
which means that if reaching definition of all variables have same assignment which
assigns a same constant to the variable, then the variable has a constant value and can be
substituted with the constant.
Suppose we are using pi variable and assign it value of 22/7
pi = 22/7 = 3.14
In the above code the compiler has to first perform division operation, which is
an expensive operation and then assign the computed result 3.14 to the variable pi. Now
if anytime we have to use this constant value of pi, then the compiler again has to look –
up for the value and again perform division operation and then assign it to pi and then use
it. This is not a good idea when we can directly assign the value 3.14 to pi variable, thus
reducing the time needed for code to run.
Also, Constant propagation reduces the number of cases where values are directly copied
from one location or variable to another, in order to simply allocate their value to another
variable. For an example :
Consider the following pseudocode :
a = 30
b = 20 - a /2
c = b * ( 30 / a + 2 ) - a
We can see that in the first expression value of a have assigned a constant value that is 30.
Now, when the compiler comes to execute the second expression it encounters a, so it
goes up to the first expression to look for the value of a and then assign the value of 30 to
a again, and then it executes the second expression. Now it comes to the third expression
and encounters b and a again, and then it needs to evaluate the first and second expression
again in order to compute the value of c. Thus, a needs to be propagated 3 times This
procedure is very time consuming.
We can instead , rewrite the same code as :
a = 30
b = 20 - 30/2
c = b * ( 30 / 30 + 2) - 30
This updated code is faster as compared to the previous code as the compiler does not
need to again and again go back to the previous expressions looking up and copying the
value of a variable in order to compute the current expressions. This saves a lot of time
and thus, reducing time complexity and perform operations more efficiently.
Note that this constant propagation technique behavior depends on compiler like few
compilers perform constant propagation operations within the basic blocks; while a few
compilers perform constant propagation operations in more complex control flow.

Partial-Redundancy Elimination:

A redundant piece of code contains expressions or statements that repeat themselves or


produce the same results throughout the execution flow of the code. Similarly, a partially
redundant code has redundancy in one or more execution flows of the code but not
necessarily in all the paths of execution.

These redundancies may exist in various forms, such as in common sub-expressions and
loop-invariant expressions, etc.

For example, consider figure 1a, here the expression “b / c” is evaluated twice along one
flow path, i.e., when the condition b > c is true even though there is no change in the
values of variables b and c. Hence, this particular piece of code is partially redundant.
This redundancy can be eliminated if we compute the expression “b / c” once and store
it in a variable, say t. Then we can use the variable t in our code whenever we need the
value of the expression “b / c“.
The Lazy-Code-Motion Algorithm

After a code is optimized for eliminating redundancy, they are expected to have the
following characteristics:

1. All the common expressions that do not generate any duplication in code are
eliminated.
2. The optimized code has not introduced any new computations to the previous code.
3. The computation of expressions is done at the latest possible time.

The intuition of Partial Redundancy of a single expression:

Consider an expression E which is redundant in block A, which means that E has been
computed in all the execution paths that reach block A. Now, in this case, there must exist
a set of blocks, say B, among which all blocks have the expression E, hence making it
redundant in block A. The set of outgoing edges from B in the flowgraph necessarily forms
a cut-set that can disconnect block A from the entry of the code, if removed.

Anticipation of Expressions:

An expression E is said to be anticipated at a point P if the values being used in E are


available at that point and all the execution paths leading from P evaluate the value of E.

Algorithm:

Lazy-Code Motion:

Step 1: Find all the anticipated expressions using a backward data-flow pass at each point
in the code.

Step 2: Place the expressions where their values are anticipated along some other
execution path.

Step 3: Find the postponable expressions using a forward data-flow pass and place at a
point in the code where they cannot be postponed further. Postponable expressions are
those which are anticipated at some point in the code but they are not used for a long span
of flow of code.

Step 4: Remove all the temporary variables assignment that are used only once in the
complete code.
Loops in Flow Graphs:

A graph representation of three-address statements, called a flow graph, is useful for


understanding code-generation algorithms, even if the graph is not explicitly constructed by
a code-generation algorithm. Nodes in the flow graph represent computations, and the edges
represent the flow of control.

Dominators:

In a flow graph, a node d dominates node n, if every path from initial node of the
flow graph to n goes through d. This will be denoted by d dom n. Every initial node
dominates all the remaining nodes in the flow graph and the entry of a loop dominates all
nodes in the loop. Similarly every node dominates itself.

Example:
*In the flow graph below,

*Initial node,node1 dominates every node. *node 2 dominates itself

*node 3 dominates all but 1 and 2. *node 4 dominates all but 1,2 and 3.

*node 5 and 6 dominates only themselves,since flow of control can skip around either by
goin through the other.

*node 7 dominates 7,8 ,9 and 10. *node 8 dominates 8,9 and 10.
*node 9 and 10 dominates only themselves.
The way of presenting dominator information is in a tree, called the dominator tree, in which
• The initial node is the root.
• The parent of each other node is its immediate dominator.
• Each node d dominates only its descendents in the tree.

The existence of dominator tree follows from a property of dominators; each node
has a unique immediate dominator in that is the last dominator of n on any path from the
initial node to n. In terms of the dom relation, the immediate dominator m has the property
is d=!n and d dom n, then d dom m.

D(1)={1}
D(2)={1,2}
D(3)={1,3}
D(4)={1,3,4}
D(5)={1,3,4,5}
D(6)={1,3,4,6}
D(7)={1,3,4,7}
D(8)={1,3,4,7,8}
D(9)={1,3,4,7,8,9}
D(10)={1,3,4,7,8,10}
Natural Loops:

 One application of dominator information is in determining the loops of a flow


graph suitable for improvement. There are two essential properties of loops:

 A loop must have a single entrypoint, called the header. This entry point-dominates
all nodes in the loop, or it would not be the sole entry to the loop.
 There must be at least one way to iterate the loop(i.e.)at least one path back to the
headerOne way to find all the loops in a flow graph is to search for edges in the
flow graph whose heads dominate their tails. If a→b is an edge, b is the head and a
is the tail. These types of edges are called as back edges.
Example:

In the above graph,

7→4 4 DOM 7

10 →7 7 DOM 10
4→3
8→3
9 →1
The above edges will form loop in flow graph. Given a back edge n → d, we define the
natural loop of the edge to be d plus the set of nodes that can reach n without going
through d. Node d is the header of the loop.
Inner loops:
If we use the natural loops as “the loops”, then we have the useful property that unless
two loops have the same header, they are either disjointed or one is entirely contained in the
other. Thus, neglecting loops with the same header for the moment, we have a natural notion
of inner loop: one that contains no other loop.
When two natural loops have the same header, but neither is nested within the other, they
are combined and treated as a single loop.

You might also like