0% found this document useful (0 votes)
33 views6 pages

Code Transformation

The document discusses three techniques for improving program performance by increasing parallelism in fully permutable loops: 1. Exploiting fully permutable loops allows loops to execute in parallel by creating a loop nest transform from independent solutions to time-partition constraints. 2. Wavefronting partitions loop computation using an index variable that is a combination of all permutable loop indices, grouping iterations along diagonals for parallel execution. 3. Blocking aggregates loop iterations into blocks that can be assigned to processors, enhancing data locality and reducing pipelining overhead.

Uploaded by

H
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views6 pages

Code Transformation

The document discusses three techniques for improving program performance by increasing parallelism in fully permutable loops: 1. Exploiting fully permutable loops allows loops to execute in parallel by creating a loop nest transform from independent solutions to time-partition constraints. 2. Wavefronting partitions loop computation using an index variable that is a combination of all permutable loop indices, grouping iterations along diagonals for parallel execution. 3. Blocking aggregates loop iterations into blocks that can be assigned to processors, enhancing data locality and reducing pipelining overhead.

Uploaded by

H
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Code Transformations

Exploiting Fully Permutable Loops:


Exploiting fully permutable loops is a technique used to improve
the performance of a program by increasing parallelism. The
technique is based on the idea that when multiple loops can be
executed in any order, it is possible to execute them in parallel.

The technique involves creating a loop nest with k outermost


fully permutable loops from k independent solutions to the time-
partition constraints. This is done by making the kth solution the
kth row of the new transform. Once the affine transform is
created, an algorithm can be used to generate the code.
• The solutions found in the SOR (Successive Over-Relaxation)
example were [1 0] and [1 1]. By making the first solution the first
row and the second solution the second row, the transform 1 0 1 1
is created. By making the second solution the first row instead, the
transform 1 1 1 0 is created.
• This technique is useful because it allows the program to take
advantage of the parallelism present in the loop nest, which can
lead to a significant increase in performance.
Wavefronting:

• It is also easy to generate k 1 inner parallelizable loops from a loop with k


outermost fully permutable loops. Although pipelining is preferable, we include
this information here for completeness..

• We partition the computation of a loop with k outermost fully permutable loops


using a new index variable i’, where i’ is defined to be some combination of all the
indices in the k permutable loop nest.

• We create an outermost sequential loop that iterates through the i0 partitions in


increasing order; the computation nested within each partition is ordered as
before. The 1st k 1 loops within each partition are guaranteed to be parallelizable.
Intuitively, if given a two-dimensional iteration space, this transform groups
iterations along 135 diagonals as an execution of the outermost loop. This
strategy guarantees that iterations within each iteration of the outermost loop
have no data dependence.
Blocking:

• A k-deep, fully permutable loop nest can be blocked in k-dimensions.


Instead of assigning the iterations to processors based on the value of the
outer or inner loop indexes, we can aggregate blocks of iterations into one
unit. Blocking is useful for enhancing data locality as well as for minimizing
the overhead of pipelining.
Blocking:
A simple loop nest.  Blocked version of this loop nest
• for (i=0; i<n; i++) • for (ii = 0; ii<n; i+=b)
for (jj = 0; jj<n; jj+=b)
for (j=1; j<n; j++) for (i = ii*b; i <= min(ii*b-1, n);
{ i++)
<S> for (j = ii*b; j <= min(jj*b-1,
} n); j++) {
<S>
}
• Before • After

You might also like