A Fast Parallel Matrix Inversion Algorithm based on Heterogeneous Multicore Architectures

被引:0
|
作者
Yu, Denggao [1 ]
He, Shiwen [1 ,2 ]
Huang, Yongming [1 ]
Yu, Guangshi [1 ]
Yang, Luxi [1 ]
机构
[1] Southeast Univ, Sch Informat Sci & Engn, Nanjing 210096, Jiangsu, Peoples R China
[2] Southeast Univ, Dept Radio Engn, State Key Lab Millimeter Waves, Nanjing 210096, Jiangsu, Peoples R China
关键词
matrix inversion; high performance computing; software-defined radio; GPU; CUDA;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large matrix inversion is usually a basic step in a wide range of signal processing or numerical problems, such as digital filtering, equalization detection, and etc. It is essential to figure out an algorithm to invert large matrix quickly and accurately. On the other hand, the Graphics Processor Unit (GPU) is able to provide a low-cost and flexible multicore architecture for high performance computing, which has attracted many researchers' attention for the building of GPU-based software-defined radio (SDR). In this paper, we propose a fast parallel algorithm for matrix inversion on heterogeneous multicore architectures to utilize the computational power of GPU. Our implementation is based on a modified Squared Givens Rotations (SGR) algorithm, which could adapt to the GPU architecture effectively. The result implemented on Compute Unified Device Architecture (CUDA) obtains a speedup ratio more than 20x versus the CPU-based-only algorithm when the matrix become large, and runs at up to 12.14 gigaflops/s on a graphics processor Geforce GT620 in our implementation.
引用
收藏
页码:903 / 907
页数:5
相关论文
共 50 条
  • [31] ParaStream: A parallel streaming Delaunay triangulation algorithm for LiDAR points on multicore architectures
    Wu, Huayi
    Guan, Xuefeng
    Gong, Jianya
    COMPUTERS & GEOSCIENCES, 2011, 37 (09) : 1355 - 1363
  • [32] A fast local algorithm for track reconstruction on parallel architectures
    Perez, Daniel Hugo Campora
    Neufeld, Niko
    Riscos Nunez, Agustin
    2019 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2019, : 698 - 707
  • [33] Fast and Portable Locking for Multicore Architectures
    Lozi, Jean-Pierre
    David, Florian
    Thomas, Gael
    Lawall, Julia
    Muller, Gilles
    ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2016, 33 (04):
  • [34] Network Coding Parallelization Based on Matrix Operations for Multicore Architectures
    Wunderlich, Simon
    Cabrera, Juan
    Fitzek, Frank H. P.
    Pedersen, Morten V.
    2015 IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS WIRELESS BROADBAND (ICUWB), 2015,
  • [35] Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures
    Ltaief, Hatem
    Kurzak, Jakub
    Dongarra, Jack
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2010, 21 (04) : 417 - 423
  • [36] Time Based Agent Garbage Collection Algorithm for Multicore Architectures
    Muneeswari, G.
    Shunmuganathan, K. L.
    PROCEEDINGS OF THE 2012 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI'12), 2012, : 215 - 219
  • [37] Dynamic partitioning-based JPEG decompression on heterogeneous multicore architectures
    Sodsong, Wasuwee
    Hong, Jingun
    Chung, Seongwook
    Lim, Yeongkyu
    Kim, Shin-Dug
    Burgstaller, Bernd
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (02): : 517 - 536
  • [38] An Efficient Pipelined Parallel Join Algorithm on Heterogeneous Distributed Architectures
    Hassan, Mohamad Al Hajj
    Bamha, Mostafa
    SOFTWARE AND DATA TECHNOLOGIES, 2009, 47 : 119 - 133
  • [39] Performance Modelling of Heterogeneous ISA Multicore Architectures
    Boran, Nirmal Kumar
    Meghwal, Rameshwar Prasad
    Sharma, Kuldeep
    Kumar, Binod
    Singh, Virendra
    PROCEEDINGS OF 2016 IEEE EAST-WEST DESIGN & TEST SYMPOSIUM (EWDTS), 2016,
  • [40] Parallel tiled QR factorization for multicore architectures
    Buttari, Alfredo
    Langou, Julien
    Kurzak, Jakub
    Dongarra, Jack
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2008, 20 (13): : 1573 - 1590