IRIS-BLAS: Towards a Performance Portable and Heterogeneous BLAS Library

被引：5

作者：

Miniskar, Narasinga Rao ^{[1
]}

Monil, Mohammad Alaul Haque ^{[1
]}

Valero-Lara, Pedro ^{[1
]}

Liu, Frank ^{[1
]}

Vetter, Jeffrey S. ^{[1
]}

机构：

[1] Oak Ridge Natl Lab, Comp Sci & Math Div, Oak Ridge, TN 37830 USA

来源：

2022 IEEE 29TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS, HIPC | 2022年

关键词：

Performance Portable; Heterogeneity; IRIS; BLAS; Tasking;

D O I：

10.1109/HiPC56025.2022.00042

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents IRIS-BLAS, a novel heterogeneous and performance portable BLAS library.IRIS-BLAS is built on top of the IRIS runtime and multiple vendor and open-source BLAS libraries. It can transparently use all the architectures/devices available in a heterogeneous system, using the appropriate BLAS library based on the task mapping at run time. Thus, IRIS-BLAS is portable across a broad spectrum of architectures and BLAS libraries, alleviating the worry of application developers about modifying the application source code. Even though the emphasis is on portability, IRIS-BLAS provides competitive or even better performance than other state-of-the-art references. Moreover, IRIS-BLAS offers new features such as efficiently using extremely heterogeneous systems composed of multiple GPUs from different hardware vendors.

引用

页码：256 / 261

页数：6

共 50 条

[41] Performance evaluation of kernel fusion BLAS routines on the GPU: iterative solvers as case study
S. Tabik
G. Ortega
E. M. Garzón
The Journal of Supercomputing, 2014, 70 : 577 - 587
[42] Performance of level 3 BLAS kernels in a dynamically partitioned data-flow environment
Berger, P
Gruszka, S
Gottlieb, I
Singer, Y
COMPUTING SYSTEMS IN ENGINEERING, 1995, 6 (4-5): : 357 - 361
[43] Performance evaluation of kernel fusion BLAS routines on the GPU: iterative solvers as case study
Tabik, S.
Ortega, G.
Garzon, E. M.
JOURNAL OF SUPERCOMPUTING, 2014, 70 (02): : 577 - 587
[44] Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance
Underwood, KD
Hemmert, KS
12TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2004, : 219 - 228
[45] Performance data of multiple-precision scalar and vector BLAS operations on CPU and GPU
Isupov, Konstantin
DATA IN BRIEF, 2020, 30
[46] High performance BLAS formulation of the multipole-to-local operator in the fast multipole method
Coulaud, O.
Fortin, P.
Roman, J.
JOURNAL OF COMPUTATIONAL PHYSICS, 2008, 227 (03) : 1836 - 1862
[47] Model-driven Level 3 BLAS Performance Optimization on Loongson 3A Processor
Zhang Xianyi
Wang Qian
Zhang Yunquan
PROCEEDINGS OF THE 2012 IEEE 18TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2012), 2012, : 684 - 691
[48] XKBlas: a High Performance Implementation of BLAS-3 Kernels on Multi-GPU Server
Gautier, Thierry
Lima, Joao V. F.
2020 28TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2020), 2020, : 1 - 8
[49] GEMM-based level 3 BLAS:: High-performance model implementations and performance evaluation benchmark
Kågström, B
Ling, P
Van Loan, C
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1998, 24 (03): : 268 - 302
[50] Performance modeling and optimal block size selection for a BLAS-3 based tridiagonalization algorithm
Yamamoto, Yusaku
Eighth International Conference on High-Performance Computing in Asia-Pacific Region, Proceedings, 2005, : 249 - 256

← 1 2 3 4 5 →