Partitioning for Parallel Matrix-Matrix Multiplication with Heterogeneous Processors: The Optimal Solution

被引:8
|
作者
DeFlumere, Ashley [1 ]
Lastovetsky, Alexey [1 ]
Becker, Brett A. [1 ]
机构
[1] Univ Coll Dublin, Sch Comp Sci & Informat, Dublin 4, Ireland
关键词
Parallel Matrix Multiplication; Matrix Partitioning; Heterogeneous Computing; High Performance Computing; LINEAR ALGEBRA PROBLEMS; COMPUTATIONS; NETWORKS;
D O I
10.1109/IPDPSW.2012.12
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The problem of matrix partitioning for parallel matrix-matrix multiplication on heterogeneous processors has been extensively studied since the mid 1990s. During this time, previous research focused mainly on the design of efficient partitioning algorithms, optimally or sub-optimally partitioning matrices into rectangles. The optimality of the rectangular partitioning shape itself has never been studied or even seriously questioned. The accepted approach is that consideration of non-rectangular shapes will not significantly improve the optimality of the solution, but can significantly complicate the partitioning problem, which is already NP-complete even for the restricted case of rectangular shapes. There is no published research, however, supporting this approach. The shape of the globally optimal partitioning, and how the best rectangular partitioning compares with this global optimum, are still wide open problems. Solution of these problems will decide if new partitioning algorithms searching for truly optimal, and not necessarily rectangular, solutions are needed. This paper presents the first results of our research on the problem of optimal partitioning shapes for parallel matrix-matrix multiplication on heterogeneous processors. Namely, the case of two interconnected processors is comprehensively studied. We prove that, depending on performance characteristics of the processors and the communication link, the globally optimal partitioning will have one of just two well-specified shapes, one of which is rectangular and the other is non-rectangular. The theoretical analysis is conducted using an original mathematical technique proposed in the paper. It is shown that the technique can also be applied in the case of arbitrary numbers of processors. While comprehensive analysis of the cases of three and more processors is more complicated and the subject for future work, the paper does prove the optimality of some particular non-rectangular partitioning shapes for some combinations of performance characteristics of heterogeneous processors and communication links. The paper also presents experimental results demonstrating that the optimal non-rectangular partitioning can significantly outperform the optimal rectangular one on real-life heterogeneous HPC platforms.
引用
收藏
页码:125 / 139
页数:15
相关论文
共 50 条
  • [41] MATRIX MULTIPLICATION BY DIAGONALS ON VECTOR-PARALLEL PROCESSORS
    MADSEN, NK
    KARUSH, JI
    RODRIGUE, GH
    SIAM REVIEW, 1976, 18 (04) : 816 - 816
  • [42] Fountain Codes for Private Distributed Matrix-Matrix Multiplication
    Bitar, Rawad
    Xhemrishi, Marvin
    Wachter-Zeh, Antonia
    PROCEEDINGS OF 2020 INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY AND ITS APPLICATIONS (ISITA2020), 2020, : 480 - 484
  • [43] A data locality methodology for matrix-matrix multiplication algorithm
    Alachiotis, Nicolaos
    Kelefouras, Vasileios I.
    Athanasiou, George S.
    Michail, Harris E.
    Kritikakou, Angeliki S.
    Goutis, Costas E.
    JOURNAL OF SUPERCOMPUTING, 2012, 59 (02): : 830 - 851
  • [44] Bit-level parallel array algorithms of vector-vector and matrix-matrix multiplication
    Guo Li
    Wang Miao-Feng
    Qiu Tian
    Liu Lu
    Luo Feng
    2006 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS PROCEEDINGS, VOLS 1-4: VOL 1: SIGNAL PROCESSING, 2006, : 567 - +
  • [45] Optimizing sparse general matrix-matrix multiplication for DCUs
    Guo, Hengliang
    Wang, Haolei
    Chen, Wanting
    Zhang, Congxiang
    Han, Yubo
    Zhu, Shengguang
    Zhang, Dujuan
    Guo, Yang
    Shang, Jiandong
    Wan, Tao
    Li, Qingyang
    Wu, Gang
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (14): : 20176 - 20200
  • [46] Efficient Symmetric Band Matrix-Matrix Multiplication on GPUs
    Dufrechou, Ernesto
    Ezzatti, Pablo
    Quintana-Orti, Enrique S.
    Remon, Alfredo
    HIGH PERFORMANCE COMPUTING, CARLA 2014, 2014, 485 : 1 - 12
  • [47] A Systematic Survey of General Sparse Matrix-matrix Multiplication
    Gao, Jianhua
    Ji, Weixing
    Chang, Fangli
    Han, Shiyu
    Wei, Bingxin
    Liu, Zeming
    Wang, Yizhuo
    ACM COMPUTING SURVEYS, 2023, 55 (12)
  • [48] Automating Structured Matrix-Matrix Multiplication for Stream Processing
    Koehn, Thaddeus
    Athanas, Peter
    2016 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS (RECONFIG16), 2016,
  • [49] Hierarchical matrix-matrix multiplication based on multiprocessor tasks
    Hunold, S
    Rauber, T
    Rünger, G
    COMPUTATIONAL SCIENCE - ICCS 2004, PT 2, PROCEEDINGS, 2004, 3037 : 1 - 8
  • [50] Optimal Matrix Partitioning for Data Parallel Computing on Hybrid Heterogeneous Platforms
    Malik, Tania
    Lastovetsky, Alexey
    2020 19TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC 2020), 2020, : 1 - 11