On the effective implementation of a boundary element code on graphics processing units using an out-of-core LU algorithm

被引:5
|
作者
D'Azevedo, E. F. [1 ]
Fata, S. Nintcheu [1 ]
机构
[1] Oak Ridge Natl Lab, Comp Sci & Math Div, Oak Ridge, TN 37831 USA
关键词
Collocation approximation; Boundary element method; Triangulated boundary; Graphics processor; FACTORIZATION;
D O I
10.1016/j.enganabound.2012.02.014
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
A collocation boundary element code for solving the three-dimensional Laplace equation, publicly available from http://intetec.org, has been adapted to run on an Nvidia Tesla general-purpose graphics processing unit (CPU). Global matrix assembly and LU factorization of the resulting dense matrix are performed on the CPU. Out-of-core techniques are used to solve problems larger than the available CPU memory. The code achieved about 10 times speedup in matrix assembly over a single CPU core and about 56 Gflops/s in the LU factorization using only 512 Mbytes of GPU memory. Details of the CPU implementation and comparisons with the standard sequential algorithm are included to illustrate the performance of the CPU code. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1246 / 1255
页数:10
相关论文
共 50 条
  • [1] A detailed implementation of multithreading and out-of-core computation to the conventional boundary element algorithm with minimum code changes
    Leandro de Souza Schiara
    Amarildo Tabone Paschoalini
    Journal of the Brazilian Society of Mechanical Sciences and Engineering, 2023, 45
  • [2] A detailed implementation of multithreading and out-of-core computation to the conventional boundary element algorithm with minimum code changes
    Schiara, Leandro de Souza
    Paschoalini, Amarildo Tabone
    JOURNAL OF THE BRAZILIAN SOCIETY OF MECHANICAL SCIENCES AND ENGINEERING, 2023, 45 (02)
  • [3] LU Decomposition Method implementation using Graphics Processing Units - GPU
    Gomez, Yensy
    Osorio, John
    Perez, Lina
    2014 9TH COMPUTING COLOMBIAN CONFERENCE (9CCC), 2014, : 184 - 189
  • [4] Parallel Out-of-core Higher-Order Method of Moments Accelerated by Graphics Processing Units
    Chen, Yan
    Lin, Zhongchao
    Zhang, Yu
    Jiang, Shugang
    Zhao, Xunwang
    2015 IEEE INTERNATIONAL SYMPOSIUM ON ANTENNAS AND PROPAGATION & USNC/URSI NATIONAL RADIO SCIENCE MEETING, 2015, : 1674 - 1675
  • [5] Industry-scale finite-difference elastic wave modeling on graphics processing units using the out-of-core technique
    Venstad, Jon Marius
    GEOPHYSICS, 2016, 81 (02) : T35 - T43
  • [6] IMPLEMENTATION AND COMPUTATIONAL STUDY ON AN IN-CORE, OUT-OF-CORE PRIMAL NETWORK CODE
    KARNEY, D
    KLINGMAN, D
    OPERATIONS RESEARCH, 1976, 24 (06) : 1056 - 1077
  • [7] Using Graphics Processors to Accelerate the Solution of Out-of-Core Linear Systems
    Marques, Mercedes
    Quintana-Orti, Gregorio
    Quintana-Orti, Enrique S.
    van de Geijn, Robert A.
    EIGHTH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING, PROCEEDINGS, 2009, : 169 - +
  • [8] OUT-OF-CORE SOLVER FOR LARGE, MULTIZONE BOUNDARY-ELEMENT MATRICES
    RIGBY, RH
    ALIABADI, MH
    INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, 1995, 38 (09) : 1507 - 1533
  • [9] The design and implementation of the parallel out-of-core ScaLAPACK LU, QR, and Cholesky factorization routines
    D'Azevedo, E
    Dongarra, J
    CONCURRENCY-PRACTICE AND EXPERIENCE, 2000, 12 (15): : 1481 - 1493
  • [10] Gravitational tree-code on graphics processing units: implementation in CUDA
    Gaburov, Evghenii
    Bedorf, Jeroen
    Zwart, Simon Portegies
    ICCS 2010 - INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, PROCEEDINGS, 2010, 1 (01): : 1113 - 1121