On the effective implementation of a boundary element code on graphics processing units using an out-of-core LU algorithm

被引:5
|
作者
D'Azevedo, E. F. [1 ]
Fata, S. Nintcheu [1 ]
机构
[1] Oak Ridge Natl Lab, Comp Sci & Math Div, Oak Ridge, TN 37831 USA
关键词
Collocation approximation; Boundary element method; Triangulated boundary; Graphics processor; FACTORIZATION;
D O I
10.1016/j.enganabound.2012.02.014
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
A collocation boundary element code for solving the three-dimensional Laplace equation, publicly available from http://intetec.org, has been adapted to run on an Nvidia Tesla general-purpose graphics processing unit (CPU). Global matrix assembly and LU factorization of the resulting dense matrix are performed on the CPU. Out-of-core techniques are used to solve problems larger than the available CPU memory. The code achieved about 10 times speedup in matrix assembly over a single CPU core and about 56 Gflops/s in the LU factorization using only 512 Mbytes of GPU memory. Details of the CPU implementation and comparisons with the standard sequential algorithm are included to illustrate the performance of the CPU code. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1246 / 1255
页数:10
相关论文
共 50 条
  • [31] An efficient implementation of Bailey and Borwein's algorithm for parallel random number generation on graphics processing units
    Beliakov, Gleb
    Johnstone, Michael
    Creighton, Doug
    Wilkin, Tim
    COMPUTING, 2013, 95 (04) : 309 - 326
  • [32] Accelerating an Imaging Spectroscopy Algorithm for Submerged Marine Environments Using Graphics Processing Units
    Goodman, James A.
    Kaeli, David
    Schaa, Dana
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2011, 4 (03) : 669 - 676
  • [33] Out-of-core diffraction algorithm using multiple SSDs for ultra-high-resolution hologram generation
    Lee, Jaehong
    Kim, Duksu
    OPTICS EXPRESS, 2023, 31 (18) : 28683 - 28700
  • [34] File I/O Cache Performance of Supercomputer Fugaku Using an Out-of-Core Direct Numerical Simulation Code of Turbulence
    Hatanaka, Yuto
    Yamane, Yuki
    Yamaguchi, Kenta
    Soga, Takashi
    Musa, Akihiro
    Ishihara, Takashi
    Uno, Atsuya
    Komatsu, Kazuhiko
    Kobayashi, Hiroaki
    Yokokawa, Mitsuo
    COMPUTATIONAL SCIENCE, ICCS 2024, PT VI, 2024, 14937 : 173 - 187
  • [35] cuFSDAF: An Enhanced Flexible Spatiotemporal Data Fusion Algorithm Parallelized Using Graphics Processing Units
    Gao, Huan
    Zhu, Xiaolin
    Guan, Qingfeng
    Yang, Xue
    Yao, Yao
    Zeng, Wen
    Peng, Xuantong
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [36] High-speed Nonlinear finite element analysis for surgical simulation using graphics processing units
    Taylor, Zeike A.
    Cheng, Mario
    Ourselin, Sebastien
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2008, 27 (05) : 650 - 663
  • [37] Real-time nonlinear finite element analysis for surgical simulation using graphics processing units
    Taylor, Zeike A.
    Cheng, Mario
    Ourselin, Sebastien
    MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION - MICCAI 2007, PT 1, PROCEEDINGS, 2007, 4791 : 701 - +
  • [38] Accelerating the Gillespie Exact Stochastic Simulation Algorithm Using Hybrid Parallel Execution on Graphics Processing Units
    Komarov, Ivan
    D'Souza, Roshan M.
    PLOS ONE, 2012, 7 (11):
  • [39] Using general-purpose computing on graphics processing units (GPGPU) to accelerate the ordinary kriging algorithm
    Gutierrez de Rave, E.
    Jimenez-Hornero, F. J.
    Ariza-Villaverde, A. B.
    Gomez-Lopez, J. M.
    COMPUTERS & GEOSCIENCES, 2014, 64 : 1 - 6
  • [40] Visualizing 3D/4D environmental data using many-core graphics processing units (GPUs) and multi-core central processing units (CPUs)
    Li, Jing
    Jiang, Yunfeng
    Yang, Chaowei
    Huang, Qunying
    Rice, Matt
    COMPUTERS & GEOSCIENCES, 2013, 59 : 78 - 89