Efficient implementation of Jacobi iterative method for large sparse linear systems on graphic processing units

被引：12

作者：

Ahamed, Abal-Kassim Cheik ^{[1
]}

Magoules, Frederic ^{[1
]}

机构：

[1] Univ Paris Saclay, Cent Supelec, Grande Voie Vignes, F-92295 Chatenay Malabry, France

来源：

JOURNAL OF SUPERCOMPUTING | 2017年 / 73卷 / 08期

关键词：

Jacobi method; GPU; Sparse matrices; CSR format; Finite element method; GPU;

D O I：

10.1007/s11227-016-1701-3

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, an original Jacobi implementation is considered for the solution of sparse linear systems of equations. The proposed algorithm helps to optimize the parallel implementation on GPU. The performance analysis of GPU-based (using CUDA) algorithm of the implementation of this algorithm is compared to the corresponding serial CPU-based algorithm. Numerical experiments performed on a set of matrices arising from the finite element discretization of various equations (3D Laplace equation, 3D gravitational potential equation, 3D Heat equation) with different meshes, illustrate the performance, robustness and efficiency of our algorithm, with a speed up to 23 in double-precision arithmetics.

引用

页码：3411 / 3432

页数：22

共 38 条

[31]

Margaris A., 2014, CORR

[32] Accelerating the discontinuous Galerkin method for seismic wave propagation simulations using multiple GPUs with CUDA and MPI [J].

Mu, Dawei ;

Chen, Po ;

Wang, Liqiang .

EARTHQUAKE SCIENCE, 2013, 26 (06) :377-393

[33]

Ping Guo, 2010, Proceedings 2010 International Conference on Computational and Information Sciences (ICCIS 2010), P1154, DOI 10.1109/ICCIS.2010.285

[34]

Ren L, 2012, DES AUT CON, P1125

[35] Auto-tuning of level 1 and level 2 BLAS for GPUs [J].

Sorensen, Hans Henrik Brandenborg .

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2013, 25 (08) :1183-1198

[36]

Suchoski B., 2012, 2012 41st International Conference on Parallel Processing Workshops (ICPPW 2012), P140, DOI 10.1109/ICPPW.2012.23

[37]

Suski B, 2006, THESIS

[38] Sparse Matrix-Vector Multiplication Optimizations based on Matrix Bandwidth Reduction using NVIDIA CUDA [J].

Xu, Shiming ;

Lin, Hai Xiang ;

Xue, Wei .

PROCEEDINGS OF THE NINTH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS TO BUSINESS, ENGINEERING AND SCIENCE (DCABES 2010), 2010, :609-614

← 1 2 3 4 →