A Fast Parallel Matrix Inversion Algorithm based on Heterogeneous Multicore Architectures

被引：0

作者：

Yu, Denggao ^{[1
]}

He, Shiwen ^{[1
,2
]}

Huang, Yongming ^{[1
]}

Yu, Guangshi ^{[1
]}

Yang, Luxi ^{[1
]}

机构：

[1] Southeast Univ, Sch Informat Sci & Engn, Nanjing 210096, Jiangsu, Peoples R China

[2] Southeast Univ, Dept Radio Engn, State Key Lab Millimeter Waves, Nanjing 210096, Jiangsu, Peoples R China

来源：

2015 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP) | 2015年

关键词：

matrix inversion; high performance computing; software-defined radio; GPU; CUDA;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large matrix inversion is usually a basic step in a wide range of signal processing or numerical problems, such as digital filtering, equalization detection, and etc. It is essential to figure out an algorithm to invert large matrix quickly and accurately. On the other hand, the Graphics Processor Unit (GPU) is able to provide a low-cost and flexible multicore architecture for high performance computing, which has attracted many researchers' attention for the building of GPU-based software-defined radio (SDR). In this paper, we propose a fast parallel algorithm for matrix inversion on heterogeneous multicore architectures to utilize the computational power of GPU. Our implementation is based on a modified Squared Givens Rotations (SGR) algorithm, which could adapt to the GPU architecture effectively. The result implemented on Compute Unified Device Architecture (CUDA) obtains a speedup ratio more than 20x versus the CPU-based-only algorithm when the matrix become large, and runs at up to 12.14 gigaflops/s on a graphics processor Geforce GT620 in our implementation.

引用

页码：903 / 907

页数：5

共 50 条

[31] ParaStream: A parallel streaming Delaunay triangulation algorithm for LiDAR points on multicore architectures
Wu, Huayi
Guan, Xuefeng
Gong, Jianya
COMPUTERS & GEOSCIENCES, 2011, 37 (09) : 1355 - 1363
[32] A fast local algorithm for track reconstruction on parallel architectures
Perez, Daniel Hugo Campora
Neufeld, Niko
Riscos Nunez, Agustin
2019 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2019, : 698 - 707
[33] Fast and Portable Locking for Multicore Architectures
Lozi, Jean-Pierre
David, Florian
Thomas, Gael
Lawall, Julia
Muller, Gilles
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2016, 33 (04):
[34] Network Coding Parallelization Based on Matrix Operations for Multicore Architectures
Wunderlich, Simon
Cabrera, Juan
Fitzek, Frank H. P.
Pedersen, Morten V.
2015 IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS WIRELESS BROADBAND (ICUWB), 2015,
[35] Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures
Ltaief, Hatem
Kurzak, Jakub
Dongarra, Jack
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2010, 21 (04) : 417 - 423
[36] Time Based Agent Garbage Collection Algorithm for Multicore Architectures
Muneeswari, G.
Shunmuganathan, K. L.
PROCEEDINGS OF THE 2012 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI'12), 2012, : 215 - 219
[37] Dynamic partitioning-based JPEG decompression on heterogeneous multicore architectures
Sodsong, Wasuwee
Hong, Jingun
Chung, Seongwook
Lim, Yeongkyu
Kim, Shin-Dug
Burgstaller, Bernd
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (02): : 517 - 536
[38] An Efficient Pipelined Parallel Join Algorithm on Heterogeneous Distributed Architectures
Hassan, Mohamad Al Hajj
Bamha, Mostafa
SOFTWARE AND DATA TECHNOLOGIES, 2009, 47 : 119 - 133
[39] Performance Modelling of Heterogeneous ISA Multicore Architectures
Boran, Nirmal Kumar
Meghwal, Rameshwar Prasad
Sharma, Kuldeep
Kumar, Binod
Singh, Virendra
PROCEEDINGS OF 2016 IEEE EAST-WEST DESIGN & TEST SYMPOSIUM (EWDTS), 2016,
[40] Parallel tiled QR factorization for multicore architectures
Buttari, Alfredo
Langou, Julien
Kurzak, Jakub
Dongarra, Jack
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2008, 20 (13): : 1573 - 1590

← 1 2 3 4 5 →