The document discusses three techniques for improving program performance by increasing parallelism in fully permutable loops:
1. Exploiting fully permutable loops allows loops to execute in parallel by creating a loop nest transform from independent solutions to time-partition constraints.
2. Wavefronting partitions loop computation using an index variable that is a combination of all permutable loop indices, grouping iterations along diagonals for parallel execution.
3. Blocking aggregates loop iterations into blocks that can be assigned to processors, enhancing data locality and reducing pipelining overhead.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
33 views6 pages
Code Transformation
The document discusses three techniques for improving program performance by increasing parallelism in fully permutable loops:
1. Exploiting fully permutable loops allows loops to execute in parallel by creating a loop nest transform from independent solutions to time-partition constraints.
2. Wavefronting partitions loop computation using an index variable that is a combination of all permutable loop indices, grouping iterations along diagonals for parallel execution.
3. Blocking aggregates loop iterations into blocks that can be assigned to processors, enhancing data locality and reducing pipelining overhead.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 6
Code Transformations
Exploiting Fully Permutable Loops:
Exploiting fully permutable loops is a technique used to improve the performance of a program by increasing parallelism. The technique is based on the idea that when multiple loops can be executed in any order, it is possible to execute them in parallel.
The technique involves creating a loop nest with k outermost
fully permutable loops from k independent solutions to the time- partition constraints. This is done by making the kth solution the kth row of the new transform. Once the affine transform is created, an algorithm can be used to generate the code. • The solutions found in the SOR (Successive Over-Relaxation) example were [1 0] and [1 1]. By making the first solution the first row and the second solution the second row, the transform 1 0 1 1 is created. By making the second solution the first row instead, the transform 1 1 1 0 is created. • This technique is useful because it allows the program to take advantage of the parallelism present in the loop nest, which can lead to a significant increase in performance. Wavefronting:
• It is also easy to generate k 1 inner parallelizable loops from a loop with k
outermost fully permutable loops. Although pipelining is preferable, we include this information here for completeness..
• We partition the computation of a loop with k outermost fully permutable loops
using a new index variable i’, where i’ is defined to be some combination of all the indices in the k permutable loop nest.
• We create an outermost sequential loop that iterates through the i0 partitions in
increasing order; the computation nested within each partition is ordered as before. The 1st k 1 loops within each partition are guaranteed to be parallelizable. Intuitively, if given a two-dimensional iteration space, this transform groups iterations along 135 diagonals as an execution of the outermost loop. This strategy guarantees that iterations within each iteration of the outermost loop have no data dependence. Blocking:
• A k-deep, fully permutable loop nest can be blocked in k-dimensions.
Instead of assigning the iterations to processors based on the value of the outer or inner loop indexes, we can aggregate blocks of iterations into one unit. Blocking is useful for enhancing data locality as well as for minimizing the overhead of pipelining. Blocking: A simple loop nest. Blocked version of this loop nest • for (i=0; i<n; i++) • for (ii = 0; ii<n; i+=b) for (jj = 0; jj<n; jj+=b) for (j=1; j<n; j++) for (i = ii*b; i <= min(ii*b-1, n); { i++) <S> for (j = ii*b; j <= min(jj*b-1, } n); j++) { <S> } • Before • After