共 50 条
- [1] Fast and Layout-Oblivious Tensor-Matrix Multiplication with BLAS COMPUTATIONAL SCIENCE, ICCS 2024, PT I, 2024, 14832 : 256 - 271
- [2] Design of a High-Performance Tensor-Vector Multiplication with BLAS COMPUTATIONAL SCIENCE - ICCS 2019, PT I, 2019, 11536 : 32 - 45
- [3] SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication 2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2015, : 61 - 70
- [4] A Pipelined Implementation of the n-mode Tensor-Matrix Multiplication 2022 29TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS (IEEE ICECS 2022), 2022,
- [5] Design of a High-Performance GEMM-like Tensor-Tensor Multiplication ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2018, 44 (03):
- [6] Efficient Digital Implementation of n-mode Tensor-Matrix Multiplication 2021 IEEE 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS), 2021,
- [7] Anatomy of high-performance matrix multiplication ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2008, 34 (03):
- [8] A family of high-performance matrix multiplication algorithms APPLIED PARALLEL COMPUTING: STATE OF THE ART IN SCIENTIFIC COMPUTING, 2006, 3732 : 256 - 265
- [9] The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS 2017), 2017, 108 : 495 - 504
- [10] Design of three high-performance concurrent systolic arrays for band matrix multiplication CHINESE JOURNAL OF ELECTRONICS, 2005, 14 (04): : 559 - 563