Academic
Publications
Effective automatic parallelization of stencil computations

Effective automatic parallelization of stencil computations,10.1145/1250734.1250761,Sriram Krishnamoorthy,Muthu Manikandan Baskaran,Uday Bondhugula,J.

Effective automatic parallelization of stencil computations   (Citations: 21)
BibTex | RIS | RefWorks Download
Performance optimization of stencil computations has been widely studied in the literature, since they occur in many computationally intensive scientific and engineering appli- cations. Compiler frameworks have also been developed that can transform sequential stencil codes for optimization of data locality and parallelism. However, loop skewing is typ- ically required in order to tile stencil codes along the time dimension, resulting in load imbalance in pipelined parallel execution of the tiles. In this paper, we develop an approach for automatic parallelization of stencil codes, that explicitly addresses the issue of load-balanced execution of tiles. Ex- perimental results are provided that demonstrate the effec- tiveness of the approach.
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
    • ...Many optimizations exist, including e.g., exploitation of data reuse across multiple time steps (out-of-place [24] and inplace [25]) and architecture-specific techniques in one single time step (out-of-place [22] and in-place [26])...
    • ...The performance improvement comes from the pragma implementation that exploits data reuse and parallelism using the overlapped tiling described in [24]...
    • ...In [24], how to automate this transformation is addressed but not when to apply it, which is more important for performance...
    • ...The improvement comes from the pragma implementation, which starts from the split tiling introduced in [24] and then exploits several architecture-specific optimizations, including using SPM and fine-grain synchronization...

    Huimin Cuiet al. Extendable pattern-oriented optimization directives

    • ...There has been considerable recent interest in optimization of stencil computations [7], [6], [16], [17], [26], [25], [11], [34], [10], [4], [9], [37], [35], [38], [8], [36], [40], [24], [21], [31]...
    • ...In addition, other transformations such as tiling of stencil computations for multicore architectures have been addressed in [40], [24], [21], [31]...

    Thomas Henrettyet al. Data Layout Transformation for Stencil Computations on Short-Vector SI...

    • ...These requirements lead to skewed tiles in the spacetime, see Fig. 2. The tile dimensions form a large optimization space which can be explored empirically [9]–[11] and systematically [12]–[14], whereby it makes a big difference if the exploration targets mainly data locality, or parallelism, or both equally...
    • ...This type of dependence resolution between parallelogram tiles is called split-tiling [12]...
    • ...Wonnacott [8] and Krishnamoorthy et al. [12] deal with multi-processor systems, so in order to reduce the communication, they align the base of the higher parallelogram with the top of the lower one in the split-tiling scheme...

    Robert Strzodkaet al. Cache Accurate Time Skewing in Iterative Stencil Computations

    • ...In cache aware time skewing schemes, flat parallelization strategies are applied [11,12,18]...

    Robert Strzodkaet al. Cache oblivious parallelograms in iterative stencil computations

    • ...Stencil computations feature abundant parallelism and low computational intensity which offers great opportunity for optimization in temporal and spatial locality, making them effective architectural evaluation benchmarks [4]...
    • ...Krishnamoorthy et al. developed an approach for automatic parallelization of stencil codes, explicitly addressing the issue of load-balanced execution of tiles caused by loop skewing in the time dimension [4]...

    Xudong Fanget al. Optimizing Stencil Application on Multi-thread GPU Architecture Using ...

Sort by: