A Fast Parallel Matrix Inversion Algorithm based on Heterogeneous Multicore Architectures

被引：0

作者：

Yu, Denggao ^{[1
]}

He, Shiwen ^{[1
,2
]}

Huang, Yongming ^{[1
]}

Yu, Guangshi ^{[1
]}

Yang, Luxi ^{[1
]}

机构：

[1] Southeast Univ, Sch Informat Sci & Engn, Nanjing 210096, Jiangsu, Peoples R China

[2] Southeast Univ, Dept Radio Engn, State Key Lab Millimeter Waves, Nanjing 210096, Jiangsu, Peoples R China

来源：

2015 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP) | 2015年

关键词：

matrix inversion; high performance computing; software-defined radio; GPU; CUDA;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large matrix inversion is usually a basic step in a wide range of signal processing or numerical problems, such as digital filtering, equalization detection, and etc. It is essential to figure out an algorithm to invert large matrix quickly and accurately. On the other hand, the Graphics Processor Unit (GPU) is able to provide a low-cost and flexible multicore architecture for high performance computing, which has attracted many researchers' attention for the building of GPU-based software-defined radio (SDR). In this paper, we propose a fast parallel algorithm for matrix inversion on heterogeneous multicore architectures to utilize the computational power of GPU. Our implementation is based on a modified Squared Givens Rotations (SGR) algorithm, which could adapt to the GPU architecture effectively. The result implemented on Compute Unified Device Architecture (CUDA) obtains a speedup ratio more than 20x versus the CPU-based-only algorithm when the matrix become large, and runs at up to 12.14 gigaflops/s on a graphics processor Geforce GT620 in our implementation.

引用

页码：903 / 907

页数：5

共 50 条

[41] Block wiedemann algorithm on multicore architectures
Vialla, Bastien
ACM Communications in Computer Algebra, 2014, 47 (3-4): : 102 - 103
[42] An efficient parallel set container for multicore architectures
de Vega, Alvaro
Andrade, Diego
Fraguela, Basilio B.
APPLICATIONS, TOOLS AND TECHNIQUES ON THE ROAD TO EXASCALE COMPUTING, 2012, 22 : 369 - 376
[43] A recursive partitioning algorithm for matrix inversion on parallel computers
Ostermark, R
KYBERNETES, 1998, 27 (4-5) : 496 - +
[44] Parallel tiled QR factorization for multicore architectures
Buttari, Alfredo
Langou, Julien
Kurzak, Jakub
Dongarra, Jack
PARALLEL PROCESSING AND APPLIED MATHEMATICS, 2008, 4967 : 639 - +
[45] Parallel query processing in databases on multicore architectures
Acker, Ralph
Roth, Christian
Bayer, Rudolf
ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, PROCEEDINGS, 2008, 5022 : 2 - +
[46] Parallel construction of wavelet trees on multicore architectures
José Fuentes-Sepúlveda
Erick Elejalde
Leo Ferres
Diego Seco
Knowledge and Information Systems, 2017, 51 : 1043 - 1066
[47] Parallel construction of wavelet trees on multicore architectures
Fuentes-Sepulveda, Jose
Elejalde, Erick
Ferres, Leo
Seco, Diego
KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 51 (03) : 1043 - 1066
[48] A massively parallel adaptive fast-multipole method on heterogeneous architectures
Lashuk, Ilya
Chandramowlishwaran, Aparna
Langston, Harper
Tuan-Anh Nguyen
Sampath, Rahul
Shringarpure, Aashay
Vuduc, Richard
Ying, Lexing
Zorin, Denis
Biros, George
PROCEEDINGS OF THE CONFERENCE ON HIGH PERFORMANCE COMPUTING NETWORKING, STORAGE AND ANALYSIS, 2009,
[49] A FAST PARALLEL ALGORITHM FOR COMPUTING AN INERTIA MATRIX
VASILYEV, NS
COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS, 1995, 35 (01) : 105 - 108
[50] Scalable Hybrid Loop- and Task-Parallel Matrix Inversion for Multicore Processors
Catalan, Sandra
Igual, Francisco D.
Rodriguez-Sanchez, Rafael
Quintana-Orti, Enrique S.
2021 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2021, : 679 - 687

← 1 2 3 4 5 →