CMPE 152
Compiler Design
Lecture 27
Ahmed Ezzat
Optimization & Runtime
Environment
1 Ahmed Ezzat
Outline
⚫ Code optimizations Topics
⚫ Compiler Code Optimization
❑ High-level
❑ Advanced-level
⚫ Summary
2 CMPE 152 Ahmed Ezzat
Where We Are
Source
Code Lexi c al Analysis
Syntax Analysis
S e m a n t i c Analysis
I R Generation
I R Optimization
C o d e Generation
Machine
Optimization Code
3 CMPE 152
Code Optimizations Topics:
High-level Overview
⚫ Goal: improve code to be faster and/or smaller (more
efficient)
❑ Optimized code
❖ Executes faster
❖ Efficient memory usage
❖ Yielding better performance.
❑ Compilers can be designed to provide code
optimization.
❑ Users should only focus on optimizations not
provided by the compiler such as algorithms choice,
i.e., choosing a faster and/or less memory intensive
4 algorithm. CMPE 152 Ahmed Ezzat
Code Optimizations Topics:
High-level Overview
❑ Requirements
❖ Improved code has the same output
❖ Improved code has the same side effect
❖ Improved code is as correct as original
❑ Example tasks
❖ Eliminate redundant computations
❖ Efficient use of registers
❖ Eliminate loops or move computation out of loops; when
appropriate
5 CMPE 152 Ahmed Ezzat
Code Optimizations Topics:
Where Optimization can Occur
⚫ Programmer ⚫ Processor
❑ Choice of algorithm
❑ Pipelining
❑ Intelligent coding
❑ Multiple execution units
❑ Memory accesses
⚫ Compiler
❑ Branches
❑ Choice of instructions
❑ Caches
❑ Moving code
❑ Reordering code
❑ Strength reduction*
⚫ Rest of system
❑ Must be faithful to original ❑ Uncontrollable
program
* An optimization where a function is calculated more efficiently using previous values of the
6 function, or to replace expensive operation with less expensive one (e.g., replace
multiplication with shift)
Code Optimizations Topics:
Overview
⚫ Machine-independent ❑ Loop unrolling (optimization)
optimizations ❑ Enabling instruction-level
❑ Code motion parallelism
❑ Reduction in strength ⚫ Understanding processor
❑ Common sub-expression optimization
sharing ❑ Translation of instructions
⚫ Tuning: Identifying performance into operations
bottlenecks ❑ Out-of-order execution or
⚫ Machine-dependent Dynamic execution –
optimizations execution order factors-in
❑ Efficient use of registers
availability of data
⚫ Branches
❑ Pointer code
⚫ Caches and Blocking
7 CMPE 152 Ahmed Ezzat
Code Optimizations Topics:
Limitations of Compiler optimization
⚫ Operate Under Fundamental Constraint
– Must not cause any change in program behavior under any possible
condition
– Often prevents making optimizations that would only affect behavior
under pathological conditions
⚫ Behavior that may be obvious to the programmer can be
obfuscated by languages and coding styles
– E.g., data ranges may be more limited than variable types suggest
⚫ Most analysis is performed only within procedures (code block)
– Whole-program analysis is too expensive in most cases
⚫ Most analysis is based only on static information
– Compiler has difficulty anticipating run-time inputs
⚫ When in doubt, the compiler must be conservative
8 CMPE 152 Ahmed Ezzat
Code Optimizations Topics:
Machine-Independent Optimizations
⚫ Optimizations you should do regardless of processor /
compiler
⚫ Code Motion
– Reduce frequency with which computation performed
⚫ If it will always produce same result
⚫ Especially moving code out of loop
for (i = 0; i < n; i++) {
for (i = 0; i < n; i++) int ni = n*i;
for (j = 0; j < n; j++) for (j = 0; j < n; j++)
a[n*i + j] = b[j]; a[ni + j] = b[j];
}
9 CMPE 152 Ahmed Ezzat
Code Optimizations Topics:
Machine-Independent Optimizations
⚫ Most compilers do a good job with array code + simple loop
structures for (i = 0; i < n; i++) {
int ni = n*i;
int *p = a+ni;
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
for (j = 0; j < n; j++)
(*p+j)++ = b[j];
a[n*i + j] = b[j];
}
imull %ebx,%eax # i*n
movl 8(%ebp),%edi # a
leal (%edi,%eax,4),%edx # p = a+i*n (scaled by 4)
⚫ Code generated # Inner Loop
.L40:
movl 12(%ebp),%edi # b
by GCC movl (%edi,%ecx,4),%eax # b+j (scaled by 4)
movl %eax,(%edx) # *p = b[j]
addl $4,%edx # p++ (scaled by 4)
incl %ecx # j++
cmpl %ebx,%ecx # j : n (reversed)
jl .L40 # loop if j<n
10 CMPE 152 Ahmed Ezzat
Code Optimizations Topics:
Machine-Independent Optimizations
⚫ Strength reduction
❑ Replace costly operation with simpler one
❑ Shift, add instead of multiply or divide
16*x → x << 4
⚫ Utility is machine-dependent
⚫ Depends on cost of multiply or divide instruction
⚫ On Pentium II or III, integer multiply only requires 4 CPU cycles
❑ Recognize sequence of products int ni = 0;
for (i = 0; i < n; i++) {
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
for (j = 0; j < n; j++)
a[ni + j] = b[j];
a[n*i + j] = b[j];
ni += n;
}
11 CMPE 152 Ahmed Ezzat
Code Optimizations Topics:
Machine-Independent Optimizations
⚫ Share Common Sub-expressions
❑ Reuse portions of expressions
❑ Compilers often unsophisticated about exploiting arithmetic
properties
/* Sum neighbors of i,j */ int inj = i*n + j;
up = val[(i-1)*n + j]; up = val[inj - n];
down = val[(i+1)*n + j]; down = val[inj + n];
left = val[i*n + j-1]; left = val[inj - 1];
right = val[i*n + j+1]; right = val[inj + 1];
sum = up + down + left + right; sum = up + down + left + right;
3 multiplications: i*n, (i–1)*n, (i+1)*n 1 multiplication: i*n
12 CMPE 152 Ahmed Ezzat
Code Optimizations Topics:
Machine-dependent Optimizations
⚫ Efficient use of registers: Reading and writing
registers is much faster than reading/writing memory
⚫ Limitations
❑ Compiler not always able to determine whether variable can
be held in register
❑ Possibility of aliasing
❑ See example later
13 CMPE 152 Ahmed Ezzat
Code Optimizations Topics:
Machine-dependent Optimizations
⚫ Pointer vs. Array Code Inner Loops:
❑ Array Code
.L24: # Loop:
addl (%eax,%edx,4),%ecx # sum += data[i]
incl %edx # i++
cmpl %esi,%edx # i:length
jl .L24 # if < goto Loop
❑ Pointer Code
.L30: # Loop:
addl (%eax),%ecx # sum += *data
addl $4,%eax # data++
cmpl %edx,%eax # data:dend
jb .L30 # if < goto Loop
❑ Performance:
➢ Array Code: 4 instructions in 2 clock cycles
➢ Pointer Code: Almost same 4 instructions in 3 clock cycles
14 CMPE 152 Ahmed Ezzat
Code Optimizations Topics:
Machine-dependent Optimizations
⚫ Inner Loops: void combine4p(vec_ptr v, int *dest)
{
int length = vec_length(v);
int sum = 0; int *data = get_vec_start(v);
int *dend = data+length;
for (i = 0 ; i < n ; i++) int sum = 0;
sum += v[i]; while (data < dend) {
sum += *data;
*dest = sum; data++;
}
*dest = sum;
}
⚫ Optimization
❑ Use pointers rather than array references
❑ Compiled -O2
➢ Oops! We’re not making progress here!
15 CMPE 152 Ahmed Ezzat
Compiler Code Optimizations:
⚫ The optimizer code sits between the front
end and the code generator (Back end):
❑ Works with intermediate code.
❑ Can do control flow analysis.
❑ Can do data flow analysis.
❑ Does transformations to improve the intermediate
code.
16 CMPE 152 Ahmed Ezzat
Compiler Code Optimizations:
High-level
⚫ Optimizations provided by a compiler at high-
level includes:
❑ Inlining small functions
❑ Code hoisting (move code outside loop)
❑ Dead store elimination
❑ Eliminating common sub-expressions
❑ Loop unrolling (optimization)
❑ Loop optimizations: Code motion, Induction variable
elimination (reduce add/subtract in a loop), and
Reduction in strength.
17 CMPE 152 Ahmed Ezzat
Compiler Code Optimizations:
High-level
⚫ Example for Induction Variable Elimination:
int a[SIZE]; int a[SIZE];
int b[SIZE]; int b[SIZE];
void f (void) void f (void)
{ {
int i1, i2, i3; int i1;
for (i1 = 0, i2 = 0, i3 = 0;
for (i1 = 0 ; i1 < SIZE ; i1++)
i1 < SIZE; i1++)
a[i2++] = b[i3++]; a[i1] = b[i1];
return; return;
} }
18 CMPE 152 Ahmed Ezzat
Compiler Code Optimizations:
High-level
⚫ Inlining small functions:
❑ Repeatedly inserting the function code instead of
calling it, saves the calling overhead and enable
further optimizations.
❑ Inlining large functions will make the executable
too large.
19 CMPE 152 Ahmed Ezzat
Compiler Code Optimizations:
High-level
⚫ Code hosting:
❑ Moving computations outside loops saves computing time
❑ In the following example (2.0 * PI) is an invariant expression there
is no reason to re-compute it 100 times.
DO i = 1, 100
ARRAY(i) = 2.0 * PI * i
END DO
❑ By introducing a temporary variable 't' it can be transformed to:
t = 2.0 * PI
DO i = 1, 100
ARRAY(i) = t * i
END DO
20 CMPE 152 Ahmed Ezzat
Compiler Code Optimizations:
High-level
⚫ Dead Store elimination:
❑ If the compiler detects variables that are never used, it
may safely ignore many of the operations that compute
their values.
21 CMPE 152 Ahmed Ezzat
Compiler Code Optimizations:
High-level
⚫ Eliminating common sub-expressions:
❑ Optimization compilers are able to perform quite well:
X = A * LOG(Y) + (LOG(Y) ** 2)
❑ Introduce an explicit temporary variable t:
t = LOG(Y)
X = A * t + (t ** 2)
❑ Saves one 'heavy' function call, by an elimination of the
common sub-expression LOG(Y), the exponentiation now is:
t = LOG(Y)
X = (A + t) * t
22 CMPE 152 Ahmed Ezzat
Compiler Code Optimizations:
High-level
⚫ Loop unrolling:
❑ The loop exit checks cost CPU time.
❑ Loop unrolling tries to get rid of the checks completely
or to reduce the number of checks.
❑ If you know a loop is only performed a certain number
of times, or if you know the number of times it will be
repeated is a multiple of a constant you can unroll this
loop.
23 CMPE 152 Ahmed Ezzat
Compiler Code Optimizations:
High-level
⚫ Loop unrolling (Example):
// old loop
for(int i=0; i<3; i++) {
color_map[n+i] = i;
}
// unrolled version – eliminated the check (i<3)
int i = 0;
color_map[n+i] = i;
i++;
color_map[n+i] = i;
i++;
color_map[n+i] = i;
24 CMPE 152 Ahmed Ezzat
Compiler Code Optimizations:
High-level
⚫ Loop optimization: Code Motion
❑ Any code inside a loop that always computes the same
value can be moved outside/before the loop.
❑ Example:
while (i <= limit-2)
do {loop code}
where the loop code doesn't change the limit variable. The
subtraction, limit-2, will be inside the loop. Code motion
would substitute:
t = limit-2;
while (i <= t)
do {loop code}
25 CMPE 152 Ahmed Ezzat
Compiler Code Optimizations:
High-level
⚫ In Summary:
❑ Compilers can provide some code optimization.
❑ Programmers do not have to worry about such optimizations.
❑ Program definition must be preserved.
26 CMPE 152 Ahmed Ezzat
Compiler Code Optimizations:
Advanced-level
⚫ Intermediate Representation (IR):
❑ The best place for optimizations
❑ Each instruction is simple
❑ By analyzing sequence of IR, one can
accumulate information about definitions and use
of names
❑ Can detect patters and match them, replace them
with better code
27 CMPE 152 Ahmed Ezzat
Code Optimizations:
Advanced-level
⚫ Control Flow Graph:
❑ Basic blocks are nodes
❑ Flow of control as edges
❑ Capture the dynamic structure (returns,
branches, jumps)
28 CMPE 152 Ahmed Ezzat
Code Optimizations:
Advanced-level
⚫ Basic Blocks:
❑ A list of instructions
❑ Once the block is entered, each instruction in that
basic block is executed exactly once
❑ Only the first statement can be reached from
outside the block (no jumping in)
❑ Execution continues from first instruction to last in
the block (no jumping out)
29 CMPE 152 Ahmed Ezzat
Code Optimizations:
Advanced-level
⚫ Basic Blocks (Contd.):
❑ Begins (Leader statements)
❖ Procedure entry point
❖ Label (branch target)
❖ Instruction after branch or return
❑ Ends
❖ Jump
❖ Branch (conditional or unconditional)
❖ Return
30 CMPE 152 Ahmed Ezzat
Code Optimizations:
Advanced-level
⚫ Local optimization within the basic block:
❑ Rearrange code or replace them with different code
that does the same thing
❑ Reordering within a block is okay because we know
everything gets executed exactly once
❑ Some local techniques generalize across blocks
31 CMPE 152 Ahmed Ezzat
Code Optimizations:
Advanced-level
⚫ Global optimization is hard:
❑ Need to be able to handle multiple
predecessors/successors for a basic block.
❑ Need to be able to handle multiple paths through the
control-flow graph and may need to iterate multiple
times to compute the final value (but the analysis still
needs to terminate)
❑ Need to be able to assign each basic block a
reasonable default value before we've analyzed it.
32 CMPE 152 Ahmed Ezzat
Code Optimizations:
Advanced-level
⚫ Structure-preserving transformation on Basic
Blocks:
❑ Common sub-expression computation elimination
❖ A common sub-expression E means that E was previously
computed, and the values of variables in E have not changed
since the previous computation
❑ Dead code elimination
❖ Code that are never reached
❖ Code that does not affect the result of the program
33 CMPE 152 Ahmed Ezzat
Code Optimizations:
Advanced-level
⚫ Structure-preserving transformation on Basic
Blocks (Contd.):
❑ Renaming of temporary variables
❖ Make normal form blocks; block where temporary variables
are not reused within the basic block, i.e., block with the ability
to change any temporary variable t (e.g., t = b + c) to be (u = b
+ c) and change all uses of this instance (t); the value of the
basic block is not changed. In other words, We call such basic
block a normal form block.
❖ Help to decide whether two blocks are equivalent
❖ Allow interchange of statements
❑ Interchange of two independent adjacent statement
34 CMPE 152 Ahmed Ezzat
Summary
⚫ How do your code and data look-like during execution?
⚫ Interaction among compiler, OS, and target machine
(Assembly code)
⚫ Two main themes with memory management:
❑ Allocation of storage locations
❑ Memory Mgmt & Access to variables and data
35 CMPE 152 Ahmed Ezzat
END
36 CMPE 152 Ahmed Ezzat