Partitioning for Parallel Matrix-Matrix Multiplication with Heterogeneous Processors: The Optimal Solution

被引:8
|
作者
DeFlumere, Ashley [1 ]
Lastovetsky, Alexey [1 ]
Becker, Brett A. [1 ]
机构
[1] Univ Coll Dublin, Sch Comp Sci & Informat, Dublin 4, Ireland
关键词
Parallel Matrix Multiplication; Matrix Partitioning; Heterogeneous Computing; High Performance Computing; LINEAR ALGEBRA PROBLEMS; COMPUTATIONS; NETWORKS;
D O I
10.1109/IPDPSW.2012.12
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The problem of matrix partitioning for parallel matrix-matrix multiplication on heterogeneous processors has been extensively studied since the mid 1990s. During this time, previous research focused mainly on the design of efficient partitioning algorithms, optimally or sub-optimally partitioning matrices into rectangles. The optimality of the rectangular partitioning shape itself has never been studied or even seriously questioned. The accepted approach is that consideration of non-rectangular shapes will not significantly improve the optimality of the solution, but can significantly complicate the partitioning problem, which is already NP-complete even for the restricted case of rectangular shapes. There is no published research, however, supporting this approach. The shape of the globally optimal partitioning, and how the best rectangular partitioning compares with this global optimum, are still wide open problems. Solution of these problems will decide if new partitioning algorithms searching for truly optimal, and not necessarily rectangular, solutions are needed. This paper presents the first results of our research on the problem of optimal partitioning shapes for parallel matrix-matrix multiplication on heterogeneous processors. Namely, the case of two interconnected processors is comprehensively studied. We prove that, depending on performance characteristics of the processors and the communication link, the globally optimal partitioning will have one of just two well-specified shapes, one of which is rectangular and the other is non-rectangular. The theoretical analysis is conducted using an original mathematical technique proposed in the paper. It is shown that the technique can also be applied in the case of arbitrary numbers of processors. While comprehensive analysis of the cases of three and more processors is more complicated and the subject for future work, the paper does prove the optimality of some particular non-rectangular partitioning shapes for some combinations of performance characteristics of heterogeneous processors and communication links. The paper also presents experimental results demonstrating that the optimal non-rectangular partitioning can significantly outperform the optimal rectangular one on real-life heterogeneous HPC platforms.
引用
收藏
页码:125 / 139
页数:15
相关论文
共 50 条
  • [21] Parallel Algorithm for Quasi-Band Matrix-Matrix Multiplication
    Vooturi, Dharma Teja
    Kothapalli, Kishore
    PARALLEL PROCESSING AND APPLIED MATHEMATICS, PPAM 2015, PT I, 2016, 9573 : 106 - 115
  • [22] SummaGen: Parallel Matrix-Matrix Multiplication Based on Non-rectangular Partitions for Heterogeneous HPC Platforms
    Patton, Stephen
    Khaleghzadeh, Hamidreza
    Manumachu, Ravi Reddy
    Lastovetsky, Alexey
    2019 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2019, : 57 - 68
  • [23] Register-Aware Optimizations for Parallel Sparse Matrix-Matrix Multiplication
    Liu, Junhong
    He, Xin
    Liu, Weifeng
    Tan, Guangming
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2019, 47 (03) : 403 - 417
  • [24] Predicting optimal sparse general matrix-matrix multiplication algorithm on GPUs
    Wei, Bingxin
    Wang, Yizhuo
    Chang, Fangli
    Gao, Jianhua
    Ji, Weixing
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2024, 38 (03): : 245 - 259
  • [25] Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication
    Koanantakool, Penporn
    Azad, Ariful
    Buluc, Aydin
    Morozov, Dmitriy
    Oh, Sang-Yun
    Oliker, Leonid
    Yelick, Katherine
    2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016), 2016, : 842 - 853
  • [26] TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs
    Niu, Yuyao
    Lu, Zhengyang
    Ji, Haonan
    Song, Shuhui
    Jin, Zhou
    Liu, Weifeng
    PPOPP'22: PROCEEDINGS OF THE 27TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2022, : 90 - 106
  • [27] PERFORMANCE EVALUATION OF SPARSE MATRIX-MATRIX MULTIPLICATION
    Jain-Mendon, Shweta
    Sass, Ron
    2013 23RD INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2013) PROCEEDINGS, 2013,
  • [28] Optimizing Sparse Matrix-Matrix Multiplication for the GPU
    Dalton, Steven
    Olson, Luke
    Bell, Nathan
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2015, 41 (04):
  • [29] Learning from Optimizing Matrix-Matrix Multiplication
    Parikh, Devangi N.
    Huang, Jianyu
    Myers, Margaret E.
    van de Geijn, Robert A.
    2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 332 - 339
  • [30] Sparse Matrix-Matrix Multiplication on Modern Architectures
    Matam, Kiran
    Indarapu, Siva Rama Krishna Bharadwaj
    Kothapalli, Kishore
    2012 19TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2012,