共 50 条
- [41] Performance evaluation of kernel fusion BLAS routines on the GPU: iterative solvers as case study The Journal of Supercomputing, 2014, 70 : 577 - 587
- [42] Performance of level 3 BLAS kernels in a dynamically partitioned data-flow environment COMPUTING SYSTEMS IN ENGINEERING, 1995, 6 (4-5): : 357 - 361
- [43] Performance evaluation of kernel fusion BLAS routines on the GPU: iterative solvers as case study JOURNAL OF SUPERCOMPUTING, 2014, 70 (02): : 577 - 587
- [44] Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance 12TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2004, : 219 - 228
- [45] Performance data of multiple-precision scalar and vector BLAS operations on CPU and GPU DATA IN BRIEF, 2020, 30
- [47] Model-driven Level 3 BLAS Performance Optimization on Loongson 3A Processor PROCEEDINGS OF THE 2012 IEEE 18TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2012), 2012, : 684 - 691
- [48] XKBlas: a High Performance Implementation of BLAS-3 Kernels on Multi-GPU Server 2020 28TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2020), 2020, : 1 - 8
- [49] GEMM-based level 3 BLAS:: High-performance model implementations and performance evaluation benchmark ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1998, 24 (03): : 268 - 302
- [50] Performance modeling and optimal block size selection for a BLAS-3 based tridiagonalization algorithm Eighth International Conference on High-Performance Computing in Asia-Pacific Region, Proceedings, 2005, : 249 - 256