UNIT II Program design and analysis
Software components. Representations of programs. Assembly and linking.
SITE
Models of programs
Source code is not a good representation for programs:
clumsy; leaves much information implicit.
Compilers derive intermediate representations to manipulate and optiize the program.
SITE
Data flow graph
DFG: data flow graph. Does not represent control. Models basic block: code with no entry or exit more precisely with one entry and exit Describes the minimal ordering requirements on operations.
SITE
Single assignment form
x = a + b; y = c - d; z = x * y; y = b + d; original basic block x = a + b; y = c - d; z = x * y; y1 = b + d; single assignment form
SITE
Data flow graph
x = a + b; y = c - d; z = x * y; y1 = b + d; single assignment form
a + x * z DFG
SITE
c
-
y
+ y1
5
DFGs and partial orders
a + x * z + y1
SITE 6
c
-
Partial order: a+b, c-d; b+d x*y Can do pairs of operations in any order.
Control-data flow graph
CDFG: represents control and data. Uses data flow graphs as components. Two types of nodes:
decision; data flow.
SITE
Data flow node
Encapsulates a data flow graph:
x = a + b; y=c+d
Write operations in basic block form for simplicity.
SITE
Control
T cond F
v1 value v2 v3
v4
Equivalent forms
SITE 9
CDFG example
if (cond1) bb1(); else bb2(); bb3(); switch (test1) { case c1: bb4(); break; case c2: bb5(); break; case c3: bb6(); break; } T
cond1
F bb2() bb3() c1 bb4()
SITE
bb1()
test1
c2 bb5()
c3 bb6()
10
for loop
for (i=0; i<N; i++) loop_body(); for loop i=0; while (i<N) { loop_body(); i++; } equivalent
SITE
i=0 F
i<N
T
loop_body()
11
Assembly and linking
Last steps in compilation:
HLL HLL HLL compile assembly assembly assembly assemble
link
executable
link
SITE
12
Multiple-module programs
Programs may be composed from several files. Addresses become more specific during processing:
relative addresses are measured relative to the start of a module; absolute addresses are measured relative to the start of the CPU address space.
SITE 14
SITE
15
SITE
16
SITE
17
SITE
18
SITE
19
SITE
20
SITE
21
SITE
22
SITE
23
GENERATING A SYMBOL TABLE
PLC = 100 PLC = 108 PLC = 116 label1 label2 label3 ORG 100 ADR r4,c LDR r0,[r4] ADR r4,d LDR r1,[r4] SUB r0,r0,r1
label1 100 label2 108 label3 116 SITE Symbol table
24
ORG 200 p1 ADR r4,a LDR r0,[r4] ADR r4,e LDR r1,[r4] ADD r0,r0,r1 CMP r0,r1 BNE q1 p2 ADR r4,e
SITE
26
SITE
27
SITE
28
SITE
29
Basic compilation phases
HLL parsing, symbol table
machine-independent optimizations
machine-dependent optimizations assembly
SITE 30
SITE
31
SITE
32
SITE
33
SITE
34
Control code generation, contd.
1 a+b>0 x=5 2
ADR r5,a LDR r1,[r5] ADR r5,b LDR r2,b ADD r3,r1,r2 BLE label3 LDR r3,#5 ADR r5,x STR r3,[r5] B stmtent LDR r3,#7 ADR r5,x STR r3,[r5]
SITE
x=7
stmtent ...
35
SITE
36
SITE
37
SITE
38
SITE
39
SITE
40
SITE
41
SITE
42
SITE
43
Procedure linkage
Need code to:
call and return; pass parameters and results.
Parameters and returns are passed on stack.
Procedures with few parameters may use registers.
SITE
44
Procedure stacks
growth
proc1 FP frame pointer defines The end of last frame SP stack pointer defines The end of current frame
SITE
proc1(int a) { proc2(5); }
proc2 5
accessed relative to SP
45
ARM procedure linkage
APCS (ARM Procedure Call Standard):
r0-r3 pass parameters into procedure. Extra parameters are put on stack frame. r0 holds return value. r4-r7 hold register values. r11 is frame pointer, r13 is stack pointer. r10 holds limiting address on stack size to check for stack overflows.
SITE 46
Data structures
Different types of data structures use different data layouts. Some offsets into data structure can be computed at compile time, others must be computed at run time. Can be represented in arrays One dimensional array a[i] Two dimensional array a[i,j]
SITE 47
One-dimensional arrays
C array name points to 0th element:
a[0] a[1] a[2] = *(a + 1)
SITE
48
Two-dimensional arrays
row-major layout:
a[0,0]
N ...
a[0,1] ...
a[1,0] a[1,1]
= a[i*M+j]
SITE
49
Structures
Fields within structures are static offsets:
aptr
struct { int field1; char field2; } mystruct; struct mystruct a, *aptr = &a;
field1
field2
4 bytes
*(aptr+4)
SITE
50
Program optimization
Expression simplification
Constant folding:
8+1 = 9
Algebraic:
a*b + a*c = a*(b+c) distributive law
Strength reduction: For (i =0;i<8+1;i++)
SITE
52
Dead code elimination
Dead code:
#define DEBUG 0 if (DEBUG) dbg(p1);
0 0 1 dbg(p1);
Can be eliminated by analysis of control flow, constant folding. There is no else clause so the compiler totally eliminate the if statement
SITE
53
Procedure inlining
Eliminates procedure linkage overhead:
int foo(a,b,c) { return a + b - c;} function definition z = foo(w,x,y); function call
z = w + x - y; inlining result
SITE 54
Loop transformations
Goals:
reduce loop overhead; increase opportunities for pipelining; improve memory system performance.
SITE
55
Loop unrolling
Reduces loop overhead, enables some other optimizations.
for (i=0; i<N; i++) a[i] = b[i] * c[i]; assume N=4 a[0]=b[0]*c[0]; .
for (i=0; i<2; i++) { a[i*2] = b[i*2] * c[i*2]; a[i*2+1] = b[i*2+1] * c[i*2+1]; }
SITE 56
Loop fusion and distribution
Fusion combines two loops into 1:
for (i=0; i<N; i++) a[i] = b[i] * 5; for (j=0; j<N; j++) w[j] = c[j] * d[j]; for (i=0; i<N; i++) { a[i] = b[i] * 5; w[i] = c[i] * d[i]; }
Loop Distribution breaks one loop into multiple loops. Changes optimizations within loop body.
SITE 57
Loop tiling
Breaks one loop into a nest of loops. Inner loop performs operations on the subset of data. Changes order of accesses within array.
Changes cache behavior.
SITE
58
Loop tiling example
for (i=0; i<N; i++) for (j=0; j<N; j++) c[i] = a[i,j]*b[i]; for (i=0; i<N; i+=2) for (j=0; j<N; j+=2) for (ii=0; ii<min(i+2,N); ii++) for (jj=0; jj<min(j+2,N); jj++) c[ii] = a[ii,jj]*b[ii];
SITE
59
Array padding
Add array elements to change mapping into cache:
a[0,0] a[0,1] a[0,2] a[1,0] a[1,1] a[1,2]
a[0,0] a[0,1] a[0,2] a[0,2] a[1,0] a[1,1] a[1,2] a[1,2] after
SITE 60
before