IRIS-BLAS: Towards a Performance Portable and Heterogeneous BLAS Library

被引:5
|
作者
Miniskar, Narasinga Rao [1 ]
Monil, Mohammad Alaul Haque [1 ]
Valero-Lara, Pedro [1 ]
Liu, Frank [1 ]
Vetter, Jeffrey S. [1 ]
机构
[1] Oak Ridge Natl Lab, Comp Sci & Math Div, Oak Ridge, TN 37830 USA
来源
2022 IEEE 29TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS, HIPC | 2022年
关键词
Performance Portable; Heterogeneity; IRIS; BLAS; Tasking;
D O I
10.1109/HiPC56025.2022.00042
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents IRIS-BLAS, a novel heterogeneous and performance portable BLAS library.IRIS-BLAS is built on top of the IRIS runtime and multiple vendor and open-source BLAS libraries. It can transparently use all the architectures/devices available in a heterogeneous system, using the appropriate BLAS library based on the task mapping at run time. Thus, IRIS-BLAS is portable across a broad spectrum of architectures and BLAS libraries, alleviating the worry of application developers about modifying the application source code. Even though the emphasis is on portability, IRIS-BLAS provides competitive or even better performance than other state-of-the-art references. Moreover, IRIS-BLAS offers new features such as efficiently using extremely heterogeneous systems composed of multiple GPUs from different hardware vendors.
引用
收藏
页码:256 / 261
页数:6
相关论文
共 50 条
  • [41] Performance evaluation of kernel fusion BLAS routines on the GPU: iterative solvers as case study
    S. Tabik
    G. Ortega
    E. M. Garzón
    The Journal of Supercomputing, 2014, 70 : 577 - 587
  • [42] Performance of level 3 BLAS kernels in a dynamically partitioned data-flow environment
    Berger, P
    Gruszka, S
    Gottlieb, I
    Singer, Y
    COMPUTING SYSTEMS IN ENGINEERING, 1995, 6 (4-5): : 357 - 361
  • [43] Performance evaluation of kernel fusion BLAS routines on the GPU: iterative solvers as case study
    Tabik, S.
    Ortega, G.
    Garzon, E. M.
    JOURNAL OF SUPERCOMPUTING, 2014, 70 (02): : 577 - 587
  • [44] Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance
    Underwood, KD
    Hemmert, KS
    12TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2004, : 219 - 228
  • [45] Performance data of multiple-precision scalar and vector BLAS operations on CPU and GPU
    Isupov, Konstantin
    DATA IN BRIEF, 2020, 30
  • [46] High performance BLAS formulation of the multipole-to-local operator in the fast multipole method
    Coulaud, O.
    Fortin, P.
    Roman, J.
    JOURNAL OF COMPUTATIONAL PHYSICS, 2008, 227 (03) : 1836 - 1862
  • [47] Model-driven Level 3 BLAS Performance Optimization on Loongson 3A Processor
    Zhang Xianyi
    Wang Qian
    Zhang Yunquan
    PROCEEDINGS OF THE 2012 IEEE 18TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2012), 2012, : 684 - 691
  • [48] XKBlas: a High Performance Implementation of BLAS-3 Kernels on Multi-GPU Server
    Gautier, Thierry
    Lima, Joao V. F.
    2020 28TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2020), 2020, : 1 - 8
  • [49] GEMM-based level 3 BLAS:: High-performance model implementations and performance evaluation benchmark
    Kågström, B
    Ling, P
    Van Loan, C
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1998, 24 (03): : 268 - 302
  • [50] Performance modeling and optimal block size selection for a BLAS-3 based tridiagonalization algorithm
    Yamamoto, Yusaku
    Eighth International Conference on High-Performance Computing in Asia-Pacific Region, Proceedings, 2005, : 249 - 256