Sign in
Author

Conference

Journal

Organization

Year

DOI
Look for results that meet for the following criteria:
since
equal to
before
between
and
Search in all fields of study
Limit my searches in the following fields of study
Agriculture Science
Arts & Humanities
Biology
Chemistry
Computer Science
Economics & Business
Engineering
Environmental Sciences
Geosciences
Material Science
Mathematics
Medicine
Physics
Social Science
Multidisciplinary
Keywords
(7)
Acceleration Techniques
Building Block
Dense Linear Algebra
Matrix Multiplication
Optimization Technique
Oscillations
Basic Linear Algebra Subprograms
Subscribe
Academic
Publications
Accelerating GPU kernels for dense linear algebra
Accelerating GPU kernels for dense linear algebra,10.1007/9783642193286_10,Rajib Nath,Stanimire Tomov,Jack Dongarra
Edit
Accelerating GPU kernels for dense linear algebra
(
Citations: 3
)
BibTex

RIS

RefWorks
Download
Rajib Nath
,
Stanimire Tomov
,
Jack Dongarra
Implementations of the
Basic Linear Algebra Subprograms
(BLAS) interface are major
building block
of
dense linear algebra
(DLA) libraries, and therefore have to be highly optimized. We present some techniques and implementations that significantly accelerate the corre sponding routines from currently available libraries for GPUs. In particu lar, Pointer Redirecting  a set of GPU specific optimization techniques  allows us to easily remove performance
oscillations
associated with prob lem dimensions not divisible by fixed blocking sizes. For example, applied to the matrixmatrix multiplication routines, depending on the hardware configuration and routine parameters, this can lead to two times faster algorithms. Similarly, the matrixvector multiplication can be accelerated more than two times in both single and double precision arithmetic. Ad ditionally, GPU specific
acceleration techniques
are applied to develop new kernels (e.g. syrk, symv) that are up to 20! faster than the currently available kernels. We present these kernels and also show their accelera tion e!ect to higher level
dense linear algebra
routines. The accelerated kernels are now freely available through the MAGMA BLAS library.
Conference:
Vector and Parallel Processing  VECPAR
, pp. 8392, 2010
DOI:
10.1007/9783642193286_10
Cumulative
Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
(
www.springerlink.com
)
(
www.springerlink.com
)
(
www.informatik.unitrier.de
)
(
dx.doi.org
)
(
vecpar.fe.up.pt
)
More »
Citation Context
(1)
...The CUBLAS dgemm performance and the MAGMA dgetrf/dgetrs performance is reduced when the sizes (or the leading dimensions) of the matrix are not multiples of the inner blocking size [
7
]...
...The new CUBLAS 3.2 indeed increases performance for non block multiple matrix sizes through MAGMA code [
7
]...
P. Fortin
,
et al.
Deployment on GPUs of an Application in Computational Atomic Physics
References
(3)
Benchmarking GPUs to tune dense linear algebra
(
Citations: 158
)
Vasily Volkov
,
James Demmel
Conference:
Supercomputing Conference  SC
, 2008
A Note on Autotuning GEMM for GPUs
(
Citations: 18
)
Yinan Li
,
Jack Dongarra
,
Stanimire Tomov
Conference:
International Conference on Computational Science  ICCS
, pp. 884892, 2009
Evaluation and tuning of the Level 3 CUBLAS for graphics processors
(
Citations: 29
)
Sergio Barrachina
,
Maribel Castillo
,
Francisco D. Igual
,
Rafael Mayo
,
Enrique S. Quintanaortí
Conference:
International Parallel and Distributed Processing Symposium/International Parallel Processing Symposium  IPDPS(IPPS)
, pp. 18, 2008
Sort by:
Citations
(3)
Deployment on GPUs of an Application in Computational Atomic Physics
P. Fortin
,
R. Habel
,
F. Jezequel
,
J. L. Lamotte
,
N. S. Scott
Published in 2011.
An Improved Magma Gemm For Fermi Graphics Processing Units
(
Citations: 2
)
Rajib Nath
,
Stanimire Tomov
,
Jack Dongarra
Journal:
International Journal of High Performance Computing Applications  IJHPCA
, vol. 24, no. 4, pp. 511515, 2010
An Implementation of the Tile QR Factorization for a GPU and Multiple CPUs
(
Citations: 1
)
Jakub Kurzak
,
Rajib Nath
,
Jack Dongarra