Optimization of Triangular Matrix Functions in BLAS Library on Loongson2F
Optimization of Triangular Matrix Functions in BLAS Library on Loongson2F
Optimization of Triangular Matrix Functions in BLAS Library on Loongson2F
Yun Xu
,
Mingzhi Shao
,
Da Teng
BLAS (Basic
Linear Algebra
Subprograms) plays a very important role in
scientific computing
and engineering applications. ATLAS is often recommended as a way to generate an optimized BLAS library. Based on ATLAS, this paper optimizes the algorithms of triangular matrix functions on 750 MHZ Loongson 2F processorspecific architecture. Using loop unrolling,
instruction scheduling
and data prefetching techniques, computing time and
memory access
delay are both reduced, and thus the performance of functions is improved. Experimental results indicate that these optimization techniques can effectively reduce the running time of functions. After optimization, doubleprecision type function of TRSM has the speed of 1300Mflops, while singleprecision type function has the speed of 1800Mflops. Compared with ATLAS, the performance of function TRSM is improved by 50% to 60%, even by 100% to 200% under smallscale input.
Conference:
Network and Parallel Computing  NPC
, pp. 3545, 2010
DOI:
10.1007/9783642156724_5
Cumulative
Annual
