On the effective implementation of a boundary element code on graphics processing units using an out-of-core LU algorithm

被引:5
|
作者
D'Azevedo, E. F. [1 ]
Fata, S. Nintcheu [1 ]
机构
[1] Oak Ridge Natl Lab, Comp Sci & Math Div, Oak Ridge, TN 37831 USA
关键词
Collocation approximation; Boundary element method; Triangulated boundary; Graphics processor; FACTORIZATION;
D O I
10.1016/j.enganabound.2012.02.014
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
A collocation boundary element code for solving the three-dimensional Laplace equation, publicly available from http://intetec.org, has been adapted to run on an Nvidia Tesla general-purpose graphics processing unit (CPU). Global matrix assembly and LU factorization of the resulting dense matrix are performed on the CPU. Out-of-core techniques are used to solve problems larger than the available CPU memory. The code achieved about 10 times speedup in matrix assembly over a single CPU core and about 56 Gflops/s in the LU factorization using only 512 Mbytes of GPU memory. Details of the CPU implementation and comparisons with the standard sequential algorithm are included to illustrate the performance of the CPU code. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1246 / 1255
页数:10
相关论文
共 50 条
  • [21] Implementation of a High-Throughput OFDM System Using Graphics Processing Units
    Ma, Xiao
    Zhao, Hui
    Li, Geng
    Zhao, Yuping
    2013 15TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT), 2013, : 639 - 644
  • [22] Efficient implementation of effective core potential integrals and gradients on graphical processing units
    Song, Chenchen
    Wang, Lee-Ping
    Sachse, Torsten
    Preiss, Julia
    Presselt, Martin
    Martinez, Todd J.
    JOURNAL OF CHEMICAL PHYSICS, 2015, 143 (01):
  • [23] Out-of-Core Solver Using GPU-Accelerated Cluster for MoM-Based EM Code
    Zoric, Dusan P.
    Olcan, Dragan I.
    Kolundzija, Branko M.
    2014 8TH EUROPEAN CONFERENCE ON ANTENNAS AND PROPAGATION (EUCAP), 2014, : 1176 - +
  • [24] PARALLEL IMPLEMENTATION OF THE N-FINDR ENDMEMBER EXTRACTION ALGORITHM ON COMMODITY GRAPHICS PROCESSING UNITS
    Sanchez, Sergio
    Martin, Gabriel
    Plaza, Antonio
    2010 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2010, : 955 - 958
  • [25] Accelerating k-NN Classification Algorithm Using Graphics Processing Units
    Selvaluxmiy, S.
    Kumara, T. N.
    Keerthanan, P.
    Velmakivan, R.
    Ragel, R.
    Deegalla, S.
    2016 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION FOR SUSTAINABILITY (ICIAFS): INTEROPERABLE SUSTAINABLE SMART SYSTEMS FOR NEXT GENERATION, 2016,
  • [26] MGUPGMA: A Fast UPGMA Algorithm With Multiple Graphics Processing Units Using NCCL
    Hua, Guan-Jie
    Hung, Che-Lun
    Lin, Chun-Yuan
    Wu, Fu-Che
    Chan, Yu-Wei
    Tang, Chuan Yi
    EVOLUTIONARY BIOINFORMATICS, 2017, 13
  • [27] The Finite Element Boundary Integral Method Accelerated Using a Graphics Processing Unit
    Ashbach, Jason. A.
    Wang, Xiande
    Werner, Douglas H.
    2013 IEEE ANTENNAS AND PROPAGATION SOCIETY INTERNATIONAL SYMPOSIUM (APSURSI), 2013, : 914 - 915
  • [28] Efficient motion estimation and discrete cosine transform implementation using the graphics processing units
    Agha, Shahrukh
    Jan, Farmanullah
    Khan, Haroon Ahmed
    Kaleem, Muhammad
    Khan, Mansoor
    PLOS ONE, 2024, 19 (08):
  • [29] Automated Code Engine for Graphical Processing Units: Application to the Effective Core Potential Integrals and Gradients
    Song, Chenchen
    Wang, Lee-Ping
    Martinez, Todd J.
    JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2016, 12 (01) : 92 - 106
  • [30] An efficient implementation of Bailey and Borwein’s algorithm for parallel random number generation on graphics processing units
    Gleb Beliakov
    Michael Johnstone
    Doug Creighton
    Tim Wilkin
    Computing, 2013, 95 : 309 - 326