Keywords
(7)
Acceleration Techniques
Building Block
Dense Linear Algebra
Matrix Multiplication
Optimization Technique
Oscillations
Basic Linear Algebra Subprograms
Academic
Publications
Accelerating GPU kernels for dense linear algebra
Accelerating GPU kernels for dense linear algebra
Accelerating GPU kernels for dense linear algebra
Citations: 3
Rajib Nath
Stanimire Tomov
Jack Dongarra
Implementations of the
Basic Linear Algebra Subprograms
(BLAS) interface are major
building block
of
dense linear algebra
(DLA) libraries, and therefore have to be highly optimized. We present some techniques and implementations that significantly accelerate the corre sponding routines from currently available libraries for GPUs. In particu lar, Pointer Redirecting  a set of GPU specific optimization techniques  allows us to easily remove performance
oscillations
associated with prob lem dimensions not divisible by fixed blocking sizes. For example, applied to the matrixmatrix multiplication routines, depending on the hardware configuration and routine parameters, this can lead to two times faster algorithms. Similarly, the matrixvector multiplication can be accelerated more than two times in both single and double precision arithmetic. Ad ditionally, GPU specific
acceleration techniques
are applied to develop new kernels (e.g. syrk, symv) that are up to 20! faster than the currently available kernels. We present these kernels and also show their accelera tion e!ect to higher level
dense linear algebra
routines. The accelerated kernels are now freely available through the MAGMA BLAS library.
Conference:
Vector and Parallel Processing  VECPAR
, pp. 8392, 2010
DOI:
10.1007/9783642193286_10
Citation Context
The CUBLAS dgemm performance and the MAGMA dgetrf/dgetrs performance is reduced when the sizes (or the leading dimensions) of the matrix are not multiples of the inner blocking size [7]
7
The new CUBLAS 3.2 indeed increases performance for non block multiple matrix sizes through MAGMA code [7]
7
P. Fortin
Deployment on GPUs of an Application in Computational Atomic Physics
References
Benchmarking GPUs to tune dense linear algebra
Citations: 158
Vasily Volkov
James Demmel
Conference:
Supercomputing Conference  SC
, 2008
A Note on Autotuning GEMM for GPUs
Citations: 18
Yinan Li
Jack Dongarra
Stanimire Tomov
Conference:
International Conference on Computational Science  ICCS
, pp. 884892, 2009
Evaluation and tuning of the Level 3 CUBLAS for graphics processors
Citations: 29
Sergio Barrachina
Maribel Castillo
Francisco D. Igual
Rafael Mayo
Enrique S. Quintanaortí
Conference:
International Parallel and Distributed Processing Symposium/International Parallel Processing Symposium  IPDPS(IPPS)
, pp. 18, 2008
Deployment on GPUs of an Application in Computational Atomic Physics
P. Fortin
R. Habel
F. Jezequel
J. L. Lamotte
N. S. Scott
Published in 2011.
An Improved Magma Gemm For Fermi Graphics Processing Units
Citations: 2
Rajib Nath
Stanimire Tomov
Jack Dongarra
Journal:
International Journal of High Performance Computing Applications  IJHPCA
, vol. 24, no. 4, pp. 511515, 2010
An Implementation of the Tile QR Factorization for a GPU and Multiple CPUs
Citations: 1
Jakub Kurzak
Rajib Nath
Jack Dongarra